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INTRODUCTION 



I . 

A. PRESENTATION OF THE RESEARCH PROBLEM 

This research examines the problem of experimentally 
developing discrete-time model equations to represent, or 
approximate, the input-output behavior of both linear and 
nonlinear systems based on a finite set of sampled 
observations of the system inputs and outputs. 

The traditional approach to the modeling problem 
involves selecting a particular model form, estimating the 
unknown coefficients of the model from the observations, and 
finally verifying the quality of the model. The particular 
model form is commonly chosen for mathematical convenience 
or from some physical understanding of the structure of the 
system. Given a specific model equation to represent a 
system, a number of techniques exist for estimating the 
values of the model coefficients that minimize a function of 
the fitting error between the model and the system. 

We are interested in developing the techniques needed to 
obtain a suitable model when the underlying structure of the 
system is unknown . With a suitable model, a variety of 
current applications can be approached, including the 
detection and evaluation of failures that affect system 
performance. The problem is how to obtain a useful model 
from the available observations of the system. 



There are four key parts of this modeling problem. 

First, the allowed functional forms for the model equation 
must be sufficiently general to permit adequate 
approximation of the behavior of systems of interest without 
requiring an unmanageable number of terms. (The adequacy of 
an approximation is an application dependent consideration 
and will be discussed later.) For most nonlinear systems of 
interest, the discrete-time model form typically used in the 
literature is the Volterra series model. This form does not 
adequately satisfy the accuracy or compact representation 
requirements except for a very restricted class of systems. 

The second key part is the determination of the best 
error minimization technique for use in evaluating any model 
equation. This becomes an area of concern in terms of both 
accuracy and computational efficiency when we are faced with 
finite length data sequences and measurement noise. 

The third key part is the development of a general 
technique for evolving or "growing" a model equation in a 
computationally efficient and accurate manner. Existing 
techniques for approaching this part of the problem are very 
limited as to the functional model form they can handle, and 
often make somewhat artificial assumptions and 
approximations to obtain even a partial solution. The 
result is typically an inferior model of the system, with 
insufficient prediction accuracy or an excessive number of 
terms in the model equation. 
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The fourth part is determining if the obtained model is 
the "best" model, in some sense, for use in a particular 
application. This last point involves an investigation of 
whether any obtained model equation is a preferred 
representation of the system, or just one model from a set 
of functionally equivalent models. We examine each of these 
four areas in this research and extend the existing 
techniques for developing model equations. 

Most researchers have approached modeling from the 
coefficient estimation perspective, and there is a large 
body of literature on techniques for efficiently estimating 
the values of the coefficients of specific models once the 
model form has been chosen. Levinson [Ref. 1], Durbin 
[Ref. 2], Robinson [Ref. 31. and Morf [Ref. 4 and 5] have 
developed computationally efficient "recursive-in-order" 
algorithms for iteratively estimating the coefficient values 
of certain linear models. Recently, Lee [Ref. 6], 
Friedlander [Ref. 7], Perry [Ref. 8], and Parker and Perry 
[Ref. 9] have reported on extensions of these algorithms 
that estimate the coefficient values of a wider class of 
model forms. These techniques are shown to be inadequate 
for the more general problem of an unknown system form. 

We approach the modeling problem from a different 
perspective, that of systematically growing a model that 
minimizes the error residual signal (performance modeling), 
rather than just estimating the coefficients of an arbitrary 



model form. We have shifted our concern to performance 
modeling since we are really interested in the behavior of 
the system, and not the values of the coefficients of one 
particular approximation of it. This may seem at first to 
be a relatively minor difference in approach, but the 
development in the following chapters has uncovered some new 
capabilities for more efficient model determination. 

B. DISCUSSION 

Modern systems are both dynamic and complex, yet they 
generally work by cause and effect, i.e. the set of inputs 
operating on the system produces a set of outputs. 

Knowledge of this relationship between input and output, 
enables us to approach various practical applications. We 
can predict reaction based on the action applied, control 
the output by modification of the input, adapt the 
behavioral characteristics so that a given stimulus will 
result in a desired response, di agnose the cause by 
observing the effect, and detect and evaluate changes 
(failures) in a system's performance. 

For failure detection applications, it is necessary to 
have some standard or reference by which to make the 
determination that a failure has occurred. An often used 
concept provides redundancy in the form of one or more 
additional systems (or subsystems) operating in parallel 
with the original system, compares the various outputs, and 
uses an appropriate criterion to detect a failed system. 
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It is not feasible to have redundant systems in many 
applications so a simulation is used, such as a mathematical 
model that approximates the system's performance 
characteristics. In some cases, this model can be designed 
from a detailed knowledge of both the structure and 
components of the system [Ref. 10], Because extensive 
detail is typically required for creating this type of 
model, we refer to this as "microscopic modeling". But 
there is often insufficient knowledge of the system 
structure and/or component behavior of real world systems, 
or the computational complexity is excessive, and an 
alternate modeling technique is needed. 

One concept used by earlier researchers [Ref. 11 and 12] 
selects a specific model form and uses it to characterize, 
or approximate, the performance of the system from input and 
output measurements of the system. This concept is referred 
to as "input-output" or "macroscopic" modeling, and the 
approximation must be done in some meaningful sense if it is 
to be useful. The choice of this mathematical form 
determines both the quality of the model (extent to which it 
approximates the system behavior) and the meaning of the 
model coefficient estimates. If the mathematical model is 
an exact or equivalent representation of the system, there 
is a correspondence between the set of system parameters and 
the set of model coefficients. Various properties of the 
model and the error residual (difference between the system 
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and realized model outputs under identical input conditions) 
can be used in applications including failure detection and 
evaluation. The characteristics of the measured input 
signal, and of any measurement noise, has a direct effect on 
the model performance^. 

Macroscopic characterization also involves model 
building (growing), which adds terms to a given model 
equation to better fit the observed data. If a model fit is 
not adequate for an application, then the standard technique 
in the literature [Ref. 11] guesses at a "better" model and 
fits to it the same data. This "brute-force" technique 
makes little use of the specific features of the 
unacceptable model, and blindly continues until (hopefully) 
an adequate fit is obtained. 

This technique makes some physical and practical sense 
when dealing with a simple model form corresponding to the 
known form of the system. Examples include linear transfer 
functions (ARMA models) and static polynomials. In these 
cases, each successively larger model provides a better fit 
and the preceding fit can be considered as a reduced order 



1 Two interesting cases regarding the input signal are; 
(1) we have little or no control over the characteristics of 
the input signal, and (2) the input signal (or probe) is 
under our complete control for the system characterization 
process. For the work that follows, we consider case (2) 
with the assumption that the input measurements are 
representative of the normal system operation. Section E of 
Chapter VI examines both of these situations. The impact of 
measurement noise is also addressed in Chapter III. 
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model of the system. This technique is of dubious value 
when the form of the system is unknown. We need to develop 
alternative growth techniques that are more useful in the 
general case . 

A particular recursive algorithm was introduced by 
Levinson [Ref. 1], and adapted by Durbin [Ref. 2], to 
efficiently obtain the solution of a specially structured 
set of linear equations. In using this algorithm, a crucial 
simplification was often made by earlier researchers [Ref. 
4-9, 13 and 14 ]. By limiting the form of the model and 

assuming that the input sampled data sequence is ergodic, a 
special mathematical structure can be induced into the model 
evaluation equations, and exploited to save a significant 
amount of mathematical computation. This ergodic assumption 
has been rationalized from different points of view, and has 
resulted in various related evaluation techniques. The 
simplification will be examined in detail in Chapter III, 
since its true effects do not appear clearly in the 
literature. While the simplification appears reasonable in 
the isolated context in which it was made, it will be shown 
that the net effect of model evaluation using this 
simplification is typically that the obtained model is 
suboptimal in prediction performance. 

Even without these assumptions, the recursive solution 
algorithm has limited usefulness. While feasible for 
typical linear model forms, it is shown that the recursive 
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algorithm rapidly becomes computationally prohibitive when 
considering general nonlinear models. Alternative model 
growth techniques are therefore needed. 

It is often overlooked that from the same input 
sequence, different system equations can give the same 
output sequence. The result is indistinguishable 
performance characteristics from structurally different 
systems, or equivalently, multiple different (and often 
independent) model equations each adequately describing the 
performance characteristics of a single system. 

This point will be discussed further because it has 
implications for the fault detection and evaluation 
application. Using the integer n as the discrete time 
index, the system input sequence is denoted as (u(n)}, the 
output sequence as ty(n)}, and the residual sequence due to 
inaccuracies in the model equation as (e(n)}. Assume a 
suitable model has been obtained and is used in conjunction 
with the real system as shown below. 



Input sequence (u(n)} 



SYSTEM 



Output Sequence (y(n)} 



MODEL 






Residual Sequence (e(n)} 



Figure 1: Configuration for Fault Detection 
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A significant change in the characteristics of the error 
residual sequence {e(n)} could be used to indicate that 
there has been a change in the behavior of the system (fault 
detection). There are applications where we would like to 
uniquely determine the change in the system (fault 
evaluation). This last capability requires that there exist 
a unique one-to-one relationship between the value of the 
system parameter that changed, and the resulting value of a 
model coefficient. This is obviously not possible when 
there are two or more structurally different but 
equivalently performing model equations. 

The existence of a model equation that exactly describes 
a finite set of input-output measurements does not imply any 
uniqueness properties [Ref. 15]. This can be demonstrated 
by considering particular examples of structurally different 
but equivalently performing models for a given input. 

Exampl e 1.1: The time series of measurements 



{u(n) ;n=0,1 ,2.3 7 } = {1,1.3.7.17,4 1,99,239,...} and 

Cy(n) ;ns0,1 .2,3 7 } = {0,1,2,5,12,29,70,169,...} are 



equivalently described over any interval with n>1 by both of 
the linearly independent model equations; 

y(n) = u(n) - y(n-1) {1.1} 

and 

y(n) = u(n-1) y(n-1) {1.2} 
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Example 1 . 2 : The time series of measurements 



{u(n) ;nsO, 1 ,2 7,...} = 1 1,1,-1,2,-5.10,-22,47,...} and 

ty(n) ;ns0, 1 ,2 7,...} = { 0 , 1 , 0 , 2 , -3 ,7 , - 15 , 32 , . . . } are 

equivalently described over any interval with n>2 by both of 
the linearly independent model equations; 

y(n)=u(n)+y(n-1) {1.3} 

and 

y(n) = u(n-2) - y(n-1) + y(n-2) {1.4} 

Exampl e 1.3 : The equation 

y(n) = .9u(n) - . 5u ( n- 1 ) y ( n- 1 ) {1.5} 

and the equation 

y(n) = .9u(n) - . 45u ( n- 1 ) u ( n- 1 ) + . 25u ( n- 1 ) u ( n-2 ) y ( n-2 ) {1.6} 

can be realized as shown in Figures 2 and 3 respectively. 




Figure 2: A block” diagram realization of Equation {1.5} 




Figure 3: A block diagram realization of Equation {1.6} 
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These two different equations will produce the identical 
output sequence { y ( n ) ; S < = n < = T } for any given input sequence 
{ u ( n ) ; S< = n < = T } . Note that {1.5} and {1.6} are not 
independent since {1.6} can be obtained from {1.5} directly 
by the following recursion. 

Start by replacing n by n-1 in {1.5}. 

y ( n- 1 ) = . 9u ( n- 1 ) - . 5u ( n-2 ) y ( n-2 ) {1.7} 

Substituting {1.7} into the right side of {1.5} yields: 
y(n) = .9u(n) - .5u(n-1) [.9u(n-1) - . 5u ( n-2 ) y ( n-2 ) ] 

= . 9 u ( n ) - . 45u ( n- 1 ) u ( n- 1 ) + . 25u ( n- 1 ) u ( n-2 ) y ( n-2 ) {1.8} 

Since {1.8} is the same as {1.6}, therefore {1.5} and {1.6} 
are not independent. There could be the case where {1.5} 
was the system and {1.6} was the model, or vice-versa. A 
change in one system parameter would not be uniquely related 
to a resulting change in one model coefficient. This also 
explains why an experimentally obtained model may explain 
predictability, but not cause and effect. 

Measurement data of a physical system can be matched by 
more than one model equation for a number of reasons. It 
may be by coincidence; the input may keep the output of each 
model exactly the same.' It was recently shown [Ref. 16] 
that if the system is linear, the existence of linearly 
independent models that match the input-output data can be 
attributed to the particular inputs that generate the 
measurements from which these independently equivalent 
models are computed. 
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The preceding discussion shows that the following 
problems must be considered. If we obtain a set of models 
each acceptably describing the input-output performance of a 
system, can we determine ij^ any of these models include the 
structural properties of the actual system that produced the 
data? Naturally, we should expect that we will rarely be 
able to obtain a model with the same detailed structure as 
the system. The second problem is how to determine the 
"best" model for a particular application from those 
candidate models in the equivalence set. Chapter VIII will 
address these problems and present some new results in the 
context of a particular application. 

C. OVERVIEW 

Chapter II provides a review of existing general linear 
and nonlinear model forms and presents an extension to a 
unifying general model that we will use. Chapter III 
presents various error minimization techniques for 
evaluating candidate model equations and proves the 
generally superior modeling performance of one particular 
technique known as the "Covariance" least squares method, 
over the least squares technique commonly used in the 
literature. Additive measurement noise is examined, and new 
expressions are developed for the resulting distortion of 
the fitting error and of the coefficient estimates of linear 
recursive models. 
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Chapter IV discusses the equations required to evaluate 
the performance of a model as more terms are included, and 
presents a recursive technique for evaluating the solution 
of a large and useful class of equations. The computational 
advantages of this technique are compared with the direct 
least squares evaluation of each model. 

Chapter V discusses the existing model growth techniques 
based on the parameter estimation approach. It shows that 
the recursive technique of Chapter IV reduces to the 
commonly used parameter estimation technique known as 
Levinson's Algorithm when two restrictive assumptions are 
made. The use of these assumptions typically produced 
inferior models compared to those produced by the more 
general technique. Chapter V also discusses several 
pos si b 1 e 4 non 1 i ne ar model growth algorithms that are logical 
extensions of existing linear growth techniques. 

Chapter VI presents a new concept in iterative model 
growth based on the developments in the preceding chapters, 
and analyzes the advantages and limitations. This heuristic 
technique is shown to offer significant improvements over 
the existing and previously discussed methods. Chapter VII 
gives the results of computer simulations and real world 
experimental comparisons of the model growth techniques 
developed in this research. 

Chapter VIII examines three additional applications for 
the modeling methods developed in this thesis. They are 
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fault detection, fault evaluation, and reduced order 
modeling. Specific techniques are discussed and a number of 
concepts are proposed. Conclusions are given on the key 
results of this work and areas for future research are 
outli ned . 
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II . CHOICE OF THE MODEL EQUATION FORM 



A. EXISTING MODEL FORMS 

We are concerned with the determination of discrete-time 
models for both linear and nonlinear systems. The logical 
starting point is a discussion of existing linear model 
forms. We limit discussion to single-input, single-output 
systems for simplicity, but using a vector notation, the 
results can be directly extended to the multiple-input, 
multiple-output case. We also limit consideration to models 
whose input-output relationships can be described by time- 
invariant difference equations . 

After a discussion of linear models, the few general 
dynamic nonlinear models forms found in the literature are 
presented. A more general nonlinear model form that 
subsumes the preceding linear and nonlinear models is then 
discussed. One particular version of this general form is 
then developed in greater detail, and utilized in the 
remainder of this work. 



2 It is recognized that there are systems where an input- 
output relationship cannot be exactly described by a 
difference equation. Consider a discrete quantizer whose 
output y(n) at any instant n is equal to the integral part 
of the input u(n) if the fractional part of u(n) is less 
than 0.5, and is equal to 1 plus the integral part of u(n) 
if the fractional part of u(n) is greater than or equal to 
0.5. Clearly the input-output relationship of such a system 
cannot be accurately described by a difference equation. 



Linear dynamic discrete models include the moving 
average (MA), autoregressive (AR), and autoregressive-moving 
average (ARMA) forms. They are well described in the 
literature [Ref. 11. 17, and 18] and are briefly discussed 

here for completeness. 

The MA model predicts the current value of the output of 
a system as a weighted summation of the current and q 
consecutive preceding values of the system input, where q is 
known as the order (or memory) of the MA model. Following 
the previously used convention, the sampled observations of 
the system input are denoted as {u(n);S<=n<=T}, the system 
output as { y ( n ) ; S< = n< =T } , and the residual error due to 
inaccuracies in the model as { e ( n ) ; S< = n< = T } . The model 
equation can be written as; 



y(n) = a^\o) u(n) + a^(1) u(n-1) 



( q ). 

a ( q ) 



u ( n-q ) + e ( n ) 





u ( n-i ) + e ( n ) 



{ 2 . 1 } 



(q) 

The coefficients are the a (i) factors that multiply each 
corresponding (i) C ^ delayed input term. The (q) superscript 
is used to emphasize the dependency of each coefficient 
value on the order of the model, and the superscript 
notation is dropped when no ambiguity results. 

These models are called moving average because the 
current output is a weighted average of a finite "window" 
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passing over the present and past input values. Models of 
the form of Eq. {2.1} are denoted as MA(q). 

The p c ^ order AR model predicts the current value of the 
system output as a weighted summation of p consecutive 
preceding output values. Using similar notation, the 
equation for this model is written as; 

(P) (P) (p) 

y(n) = b (1) y(n-1) + b (2) y(n-2) +...+ b (p) y(n-p) + e(n) 

P (p) 

= Y b (j) y(n-j) + e ( n ) {2.2} 

j = 1 

(p) 

The coefficients are the b (j) factors that multiply each 
. th 

corresponding (j) delayed output term. Models of the form 
of Eq. {2.2} are denoted as AR(p). 

Despite their simple form, MA(q) and AR(p) modeling of 
even simple linear systems often requires an excessively 
large number of model terms (a high order model). A natural 
extension of these two models is a combination of both. 

Such mixed models are called autoregressive-moving average, 
or ARM A, models of orders p and q, and are often written as 
ARMA(p.q). The ARMA model predicts the current value of the 
system output as a weighted summation of the current value 
of the system input, q consecutive preceding values of the 
system input, and p consecutive preceding values of the 
system output. Combining the previous notations produces 
the model equation; 
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+ e ( n ) 



^.(q) _.(q) (q) 

y(n) = 0 ^ ( 0 ) u(n) + 0£ ( 1 ) u(n- 1 ) +... + 01 (q) u(n-q) 



oCp) o kpj 

(3 (1) y ( n- 1 ) + /3 (2) y ( n-2 ) + ... + /5 (P)y(n-p) 



,(P) 



(P) 



q (q) P _(p) 

= r 0/ (i) u(n-i) + Y (3 (j) y(n-j) + e(n) {2.3) 

i =0 j = r 

(q) iq (p) 

The coefficients are the a (i) and (3 (j) factors that 

multiply each corresponding (i ) t * 1 delayed input term and 
, „ t h 

(j) delayed output term, respectively. Real world linear 
systems typically include feedback, and can be adequately 
modeled with the smallest number of terms by an ARMA model. 

The literature discusses two general dynamic discrete- 
time nonlinear models. The Voltera mkdel (Ref. 19 , 20 and 
21 ] can be thought of as a nonlinear generalization of the 
MA model. This model predicts the current value of the 
system output as a linearly weighted summation of increasing 
degree products of the current and m consecutive preceding 
input values. Using the typical notation followed in the 
literature as a guide, the equation for this model form is 
wr i tten as ; 

d 



y(n) = V f[u(n-i);i= 0 , 1 , 2 ,...,m] + e(n) 
k = 1 * 



{2.4} 



where 



mm k 

Z. Z ••• Z W s 2 e k' TT 

= 0 2 =i? a =a * 1 = 1 



V° g 2 = g l V S k-l 



12.5} 



th 



and a. (g ,g g i ) is called the k degree Volterra kernel 

k 1 2 k 



29 



When the degree d equals 1, Eq . {2.4} reduces to the form of 

the MA model. 

The Volterra model is based only on a sum of products of 
past and present input values. Because of this nonrecursive 
form, the Volterra model is unable to compactly represent a 
system that includes significant feedback. A model of the 
form of Eq. {2.4} is denoted as VOL(d,m). The lower limits 
of the summations in Eq. {2.5} were purposely chosen to 
eliminate redundant terras, and therefore we have minimized 
the number of equations that need to be solved in the 
evaluation of any particular VOL(d.m). The upper summation 
limits of Eq. {2.5} are all set equal to the integer m for 
notational clarity at this point. We could, of course, use 
a more general notational convention for the upper summation 
limits (e.g. ra ; i = 1 , 2 ,.•*.) . A more general upper summation 
limit notation would produce more complexity in the 
equations, and offers no specific advantages for the problem 
examined in this thesis. The Volterra model of a system may 
not require all of these terms indicated by Eq . {2.5}. 

The Bilinear model [Ref. 22, 23. 24 and 25] predicts the 
current output of the system as a linearly weighted 
summation of the current and m consecutive preceding input 
values, plus a linearly weighted term composed of the 
product of one of m preceding output values with the current 
or one of m preceding input values. Using the typical 
notation found in the literature, the equation for this 
model form is written as; 
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{ 2 . 6 } 



y ( n ) 



T a (m) (i) uCn-1 ) 
i = 0 



+ 



£ £ (m) 

> > c (i,j) u(n-i)y(n-j) 

i = 0 j=1 



e ( n ) 



This form includes bilinear terms composed of the 
products of specific input and output factors, a feature not 
found in the previously discussed model forms. However, it 
is limited to the one type of nonlinear form shown above. 
Models of the form of Eq . {2.6} are denoted as BIL(m). . The 

restriction to equal upper summation limits is again used 
for clarity. The Bilinear model of a system may not require 
all of these terras indicated by Eq . {2.6}. 



B. DEVELOPMENT OF A MORE GENERAL MODEL FORM 

When used for the modeling of a typical nonlinear 
system, the Volterra model form suffers from the same 
limitations as the MA model form does for linear systems. 
The existence of any feedback in a system will usually 
result in the requirement for the order m to be very large 
in Eq. {2.1} in - the linear case, or in Eq. {2.4} in the 
nonlinear case. This property of nonrecursive model forms 
results in the need for an unacceptably large number of 
model terms to adequately represent the behavior of typical 
systems . 
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A natural extension of the V olterra model is to include 
Vo 1 t er r a-1 i ke terras of the output of the system, in a manner 
similar to the relationship between the MA and the ARM A 
models. An investigation of the effect of feedback in some 
common nonlinear systems leads to the conclusion that it is 
also useful to include terms that are extensions of the 
Bilinear model form. Such an extension has been made and 
some partial results concerning different versions of this 
new model form have been published [Ref. 9, 26 and 211 . 

One version of this model form was called the Nonlinear 
ARMA model in references 9 and 26. To better distinguish 
the properties of a more general form of the model, it is 
denoted as the Bivariate Volterra Model (BVM) in reference 
27 and in the work that follows. 

The coefficient notation of the previously discussed 
linear and nonlinear models forms follows the conventions 
found in the literature. Considerable thought was given to 
the need for suitable notation for the more general and 
complex model form, and also for the developments that 
follow. It was decided to have a uniform coefficient 
notation that could be applied to any of the models of this 
chapter. Following is a compact equation for BVM(d,m), a 
3VM of degree d and memory m in terms of this uniform 
coefficient notation. 
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d 



m m 



y <n > * 7 7 E ••• E e r;0 (g 1 ’ s 2 g r > Tl u < "*8 i > 

r=, L g l = ° S 2 = S 1 V S r-l 1=1 J 

m m s 

E„ E ' E 9 0;s <h l’ h 2 Mfl y <"-V 

'l = 0 h 2 =h l h s =h s-l 3 = 1 



d - 1 d-r 

- E E 

r = 1 s = 1 



m m m ram ra 

E„ E ••• E E. E ••• E. 



8 1 = ° g 2 = g l S r= g r-1 V V h l 



^iTjS * * * * * * * * * ^ s )nu(n-g, )JTy (n - h i > 

1 i=1 1 j = 1 J 



h s= h s-l 



+ e ( n ) 
{2.7} 



The coefficients are the factors starting with 9 n , 

r , u 

9 , and 9 in Eq. [2.7). Note that two subscripts and 

II , S L y ^ 

one or more parameters in parenthesis are included for each 

coefficient. A 9 n coefficient is used in conjunction with 

a terra composed exclusively of r products of past and 

present input factors. Likewise, a . 9^ coefficient is used 
* 0 ; s 

in conjunction with a term composed exclusively of s 

products of past output values. The subscript in each of 

these cases distinguishes the number of such factors in the 

corresponding model term. Finally, we use 9 for the 

r , s 

coefficient of the model term composed of r input factors 
and s output factors. In all cases, the parameters in 
parenthesis in each model coefficient distinguish the 
particular lag factors. 

A model of this general form can be completely described 
by either specifying a particular degree d and memory m, or 
by just providing the distinguishing parameters and 
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subscripts of each desired coefficient. Following is an 
example of a second degree, first order BVM equation which 
is denoted as BVM(2,1). 

y(n) = 9 1;0 (0) u(n) + 9 l;0 (1) u(n - 1) + 9 2 ;o ( 0,0) u(n)2 
+ 9 2 , 0^°’ 1 ^ u (n) u (n-1) + 9 2*0^ 1,1 ^ 

+ 9 ^.^(0,1) u(n)y(n-1) + 9 ^ ^ ( 1 , 1 ) u ( n- 1 ) y ( n- 1 ) 

+ 9 0;2 (1) y (n " 1)2 + 9 o- 2 (1,1) y (n " 1)2 + e(n) *2.8} 

This coefficient notation completely specifies the model 
terms, as demonstrates in the following examples. 

0 (1,2,3) is the coefficient of term u ( n- 1 ) u ( n-2 ) u ( n- 3 ) 

J y U 

9_ (0,1,1) is the coefficient of term u ( n ) u ( n- 1 ) y ( n- 1 ) 

4 » 1 

The choice of the various lower summation limits in Eq . 
{2.7} eliminates redundant model terms. The upper summation 
limits are set equal to m for notational clarity, as was 
done for the VOL and BIL forms. Because the upper summation 
limits of Eq. {2.7} were all set equal to m in the preceding 
pages, a compact expression for the number of coefficients 
in a full BVM of degree d and memory m can be written as; 

r s r s^ 

a n ( m + i ) d rr (m-1+j) d-1 d-r TT (m+i ) [ | (m-1+j ) 

c(d,m)=V^ i = 1 + Y" 1 + V" 1 V* LiJ IfJ 

r! s! 2_. Z_i r! s! 

r = 1 s=1 r = 1 s = 1 {2.9J 

This equation is used in subsequent chapters when the 
computational complexity of evaluating this full model form 
is considered. 
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The BVM form defined in Eq . {2.7} is limited to the 

summation of products of integer powers of past and present 
input terms, and past output terms. Other functional forms 
besides sums of integer products are possible, but this form 
appears to be most tractable for our modeling. Examination 
of Eq. {2.1} through Eq . {2.6} confirms that the BVM form 

subsumes the AR , MA , ARMA, VOL, and BIL model forms. 

For example, an ARMA(p,q) is subsumed by a BVM(1,m) when 
the degree d of the BVM is set equal to 1, and 
m = maximum (p,q). This only allows terms with descriptive 

coefficients 0. (i) or 9 n ..(j); where 0<=i<=m and 1<=j<=m. 

1 ;0 u , j. 

This includes all the terms of an ARMA(p,q). 

The BVM form is emphasized because it includes the other 
general forms discussed in the chapter and commonly used in 
the literature. Using this BVM form rather than the ARMA, 
VOL or BIL forms, typically produces a more compact and 
accurate representation of nonlinear systems with feedback. 
Chapter VII contains several computer simulated experiments 
demonstrating this point. 
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III. EXAMINATION OF ERROR MINIMIZATION TECHNIQUES 



A. DISCUSSION 

Assume we are given any particular model form linear in 
some finite number c of unknown coefficients, and a set of 
input-output data of length N>c. We are interested in 
determining the particular model equation that, in some 
meaningful sense, best approximates the performance of the 
system that produced the N output measurements from the 
corresponding N input measurements. There are different 
error criteria, including least squares and minimax, that 
could be used in minimizing the error residual. Least 
squares techniques minimize the average squared residual 
sequence value in a given interval, while miniraax techniques 
minimize the maximum absolute value of the residual sequence 
in the interval. 

Difficulties with least squares, including degraded 
modeling performance under noisy conditions, have been 
reported [Ref. 13]. Nevertheless, it has been decided to 
investigate the use of least squares techniques for the 
following reasons; (1) least squares minimization for models 
linear in the coefficients leads to a set of tractable 
linear equations in the unknowns, (2) there exists a large 
body of parameter estimation techniques in the literature 
based on least squares, and it is possible to extend some of 
these for our model growth and evaluation problem. 
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This chapter presents both the theoretical differences 
and results of computer simulated experimental comparison of 
various least squares formulations when applied to systems 
characterization. In the simulation study, two criteria 
used for comparison purposes are: (1) average squared 

residual value (fitting error), and (2) accuracy in 
estimating model coefficient values. An examination of the 
effect of additive output noise is presented in the last 
section . 

An example of a 1 i ne ar-i n-t he-coe f f i c i en t s nonlinear 
difference equation using the coefficient notation 
introduced in Chapter II is; 

y ( n ) = Q (1) u ( n ) + 0 ft .,(1,1) y(n-1)y(n-1) +0 (0,1) u(n)y(n-1) 

1»Q U,4 -L,x foil 



Eq . (3.1} contains both linear and nonlinear terms in the 

input u(n) and output y(n). Since the coefficients 9 r . s all 
enter in a linear fashion, this equation can be expressed in 
compact vector notation (all vectors in this thesis are 
column vectors). Defining a coefficient vector 0, where 



£ = t 9 0;2 <, ' ,) ’ 9 l;l <0 - ,)1 

and a term vector _x(n), where 
x_(n)^ = [u(n-1),y(n-1)y(n-1),u(n)y(n-1)] 

Eq. { 3 - 1 } can now be expressed in the vector form; 



(3.2} 



{3.3} 



y(n) = 9 T x_( n ) (3-4} 

Assume that we are given a finite set of measurements of 
the input sequence { u ( n ) ; S< = n<rT} and the corresponding 
output sequence { y ( n ) ; S < = n < =T } of a time-invariant and 
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causal system of unknown structure. To reproduce the input- 
output behavior of this system within some moderately small 
error, we choose a 1 i ne ar-i n-t he-un known coefficients 
predictor model equation of the following form. 



where e(n) is the equation error of the model at time n; 

9 is a vector of length c containing as yet unknown 
coefficients corresponding to each model term; and jjc(n) 
is a vector of c model terms, each of which is formed 
from the product of a finite number of input and output 
factors from the set: 

[u(n-m) , u ( n-m + 1 ) u(n-1) ,u(n) , y ( n-m ) ,y(n-m+1) y ( n- 1 ) ] (3*6} 

where m is some finite number called the memory (or 
order) of the model. The maximum number of input 
factors or output factors in any such product 
combination will be called the degree d of the model. 

Note that the above description fits the BVM(d,m) introduced 
in Chapter II. 

The following example is used repeatedly for 
illustration. Consider a linear MA model with q = m = 2 , 
and the coefficient and term vectors; 



T 

y(n) = 9 x(n) + e(n) 



(3.5) 




(3.71 



x(n) T = [ u ( n ) , u ( n- 1 ) , u ( n-2 ) ] 



(3.3} 



This model form is linear in the unknown coefficients 



once the x(n) is specified, and we choose to minimize the 



following nonnegative least squares error criterion; 
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1 



(3-9} 



( n 3~ n 2 + 1 n=n 2 



V 3 t n 2 

L e(n) 



1 V 3 t \ 2 

2 \ e ( n ) 

N n = n 2 



where n 2 and take on fixed integer values. 

To carry out the least squares fit in compact vector- 
matrix notation, we use underscored lower case letters to 
represent vectors and capital letters to represent matrices. 
Scalars are represented with lower case letters whenever 
possible, but occasionally capital letters are used. 

Define the output vector where 



Cy(n 2 ) ,y(n 2 +1 ) y(n 3 ) ] 



{ 3 . 10 } 



and the data measurement matrix X, where 
X T = [x.(n 2 ) x_(n 2 +1) ... {3.11} 

Substituting { 3 . 5 } , {3-7}, {3-10}, and {3-11} into {3.9} yields; 



j 2 =2 (y - x T Q) T (y - x t q) = 2 [ y T y - e T x T _y - y T x£ + q 
N . N L 



T T 
X XQ 



{ 3 . 12 } 



Following standard least squares theory [Ref. 11 - 14], 
the evaluation equation for the coefficient vector 0 that 
minimizes Eq . {3.12}, and the corresponding value of the 

minimum error criterion are determined. The details of the 
well known least squares derivation are included for 
notational development. Differentiating Eq. {3-12} with 
respect to the vector 9 using matrix calculus and equating 
the result to zero yields; 

{3.13} 



3j 2 = 0 = -2 X\ + 2 X T X9 



09 N N 

Simplifying Eq . {3-13} produces; 



1 X T X9 



T 

1 X y 



{3.14} 
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It is convenient to use the following compact notation for 

this set of c simultaneous linear equations. 

= £ {3.15} 

where R = J_ X T X 1 3 . 16} 

N 

is the positive semi -de f i n i te least squares matrix of size c, 

and 1=1 X T Z (3.17) 

N 

is a column vector of size c. The factor 1/N is retained in 

these definitions for subsequent distinction. 

To insure that Eq . (3.13} represents a unique minimum, 

2 

the second derivative of J with respect to 0 must be 
positive. Applying this result to Eq . (3*12} yields the 

added condition that; 

d 2 J 2 = 2 f X T X 1 = 2R > 0 (3-18} 

0« T 3o " 

Equations { 3 . 15} are known as the normal equations, and 

there is a unique solution if and only if matrix R is 

positive definite. This unique solution is; 

£ = R' 1 ^ 13.19J 

Using £3-16}. {3-17}. and (3-19} with { 3 . 12}, the minimum 

2 

value of J is; 

j 2 = j_ y T z - l R-1 £ 

7 

= 1 Z T I " £ T £ {3.20} 

N 
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We now investigate how the effect of the choice of the 

values of n^ and n^ in Eq. { 3 - 9 ) . changes the resulting 

2 

value of the error criterion J . The main purpose of this 

investigation is to identify the reasons for the observed 

numerical differences in the results of various least 

squares formulations that commonly are used in the 

literature. Many recent researchers [Ref. 1 - 9, 13. 14, 

and 18 - 291 put emphasis on computational simplicity and 

make assumptions or approximations related to the values of 

n „ and n. that induce special structure into the solution 
2 3 

equations (3-19} for 9. We consider a number of distinct 
cases that are discussed but not clearly compared in the 
literature . 

(1) If n^<S+m and n^<=T, this is equivalent to the 
assumption that u(n)=0 for n<S and y(n)=0 for n<S, 
and is known in the literature as the " Pr e wi nd owed " 
case [Ref. 4, 6, 28 and-291. 

(2) If n 2 >=S+m and n^T, this is equivalent to the 
assumption that u(n)=0 for n>T and y(n)=0 for n>T, 
is known in the literature as the "Postwindowed” 
case [Ref. 291, and is seldom used. 

(3) If n 2 <S+m and n 3 >T, we get both prewindowing 
and po s t wi n d owi n g since this is equivalent to 
assuming that u(n)=0 for n<S and n>T, and also 
that y(n)=0 for n<S and n>T. This is equivalent 

to rectangularly windowing the measurements. It is 



known as the "Autocorrelation" case [Ref. 7, 14, 25, 

and 28 - 30], and is the typically used method. 

(4) If n 2 >=S+ra and n 2 <=T, no window is applied 

to the observed measurements, and the so called 

"Covariance" case is realized [Ref. 28, 29 and 31]. 

Depending upon the specific choice of n 2 and n^, there are 

many different least squares error criterion values 
2 

J (n^.n^), and related model coefficient estimates ^(n^.n^), 

for a given set of input-output data. The literature 

typically reports the use of the Autocorrelation method for 

statistical considerations and because this can often lead 

to an efficient solution algorithm. This point is discussed 

further in a subsequent section. 

Examination of these four different methods from the 

unifying framework of the least squares equation { 3 - 9 } , 

reveals an interesting comparison basis for explaining the 

subsequent differences in form and performance. This 

development does not appear in the systems identification 

literature and clearly indicates which error minimization 

method should be used for the performance modeling approach 

to the general model growth problem. The main result is 

that the Covariance method generally gives superior modeling 

2 

results in terms of lower fitting error J and more accurate 
model coefficients in the vector £. The differences in 
these four methods are described analytically in terms of 
the following example, generalized in the theorem that 
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follows, and finally demonstrated in a computer simulated 
experiment . 



EXAMPLE: Let S = 1 and T = 10. Then we have the data 

{ u ( n ) } = {u( 1 ) , u ( 2 ) , u ( 3 ) , . . . , u ( 1 0 ) } {3.21} 

(y(n)} = { y ( 1 ) , y ( 2 ) , y(3) y ( 10) } {3.22} 



Let the model be given by the equation: 

y(n) = Q 1;0 (0) u ( n ) + 9 1;0 (1) u(n-1) + 9 1 .q( 2) u(n-2) + e(n) (3 
Using least squares 



.2 . r *3 . ,2 

J = 2 ^ e ( n ) 

N n = n j 

where N = n^-n^+l, and where the coefficient vector is; 

- T = ' 9 l;0 (0) - 9 li0 (,> - 9 i;o< 2) 1 

leads to 

[ X T X 1 £ = J. X T Z 

N H 



(3-24} 



{3-25} 



{3-26} 



where 

y T = C y ( 1 ) , y ( 2 ) , y ( 3 ) y(io)] {3.27} 

and X is the N x 3 data matrix involving {u(n)}, whose 
contents depends upon the choice of n^ and n^ as shown in 
the following four cases. 



.23} 
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Case 1: Prewindowed method 



Heren 2 =S= 1, n 3 = T = 10, 



u( 1 ) 


0 


0 


u(2) 


u ( 1) 


0 


u( 3) 


u(2) 


u ( 1 ) 


u ( 4 ) 


u ( 3 ) 


u(2) 


u ( 5 ) 


u ( 4 ) 


u ( 3 ) 


u ( 6 ) 


u ( 5 ) 


u ( 4 ) 


u ( 7 ) 


u ( 6 ) 


u ( 5 ) 


u ( 8 ) 


u ( 7 ) 


u ( 6 ) 


u ( 9 ) 


u ( 8 ) 


u ( 7 ) 


u ( 10) 


u ( 9 ) 


u ( 8 ) 



and the data matrix becomes; 



(3.28} 











— 






The solution 


of the normal 


equations { 3 • 


26} 


is now given by; 


— 






4° 2 ! 


10 


1 


1 0 ~ 


S l;o (0) 




1 


I u(n) 


y u ( n ) u ( n- 1 ) 


1 1 


y u ( n ) u ( n-2 ) 






1 0 


n = 1 j 10 


n = 2 


! 10 

1 


n = 3 








1 


10 2 


1 0 


1 

1 


a i;o (1) 


= 






V u(n-1) 


1 JL 


y u ( n- 1 ) u ( n-2 ) 








1 To 


n = 2 


1 10 


n = 3 








f 




-i — 














1 


10 2 


9 U0 (2) 






SYMMETRIC 




1 1 
! 10 


t u(n-2) 












n = 3 



10 

Y u ( n ) y ( n ) 
10 n = 1 



10 

y u ( n- 1 ) y ( n ) 
10 n = 2 



10 

y u ( n-2 ) y ( n ) 
10 n = 3 



13-29} 



Note that the square matrix in Eq . (3-29} has different 

summation limits along each diagonal parallel to the main 
diagonal . 



44 



Case 2: Postwindowed method 



1+2 = 3 


f n ^ ~ 


T+ra= 1 0 


u ( 3) 


u ( 2 ) 


u(1)“ 


u(4) 


u ( 3 ) 


u ( 2 ) 


u ( 5 ) 


u ( 4 ) 


u ( 3 ) 


u ( 6 ) 


u ( 5 ) 


u ( 4 ) 


u ( 7 ) 


u ( 6 ) 


u ( 5 ) 


u ( 8 ) 


u ( 7 ) 


u ( 6 ) 


u ( 9 ) 


u ( 8 ) 


u ( 7 ) 


u( 10) 


u ( 9 ) 


u ( 8 ) 


0 


u ( 10) 


u ( 9 ) 


0 


0 


u( 10) 



{3.30} 



The solution of the normal equations { 3 . 26 } is now given by; 

-1 



9 1 ; 0 <0) 



9 1 ; 0 (,) 



9 1;0 (2) 



10 



I 



10 



1 0 



1 £ u ( n ) j 1__ £ u ( n ) u ( n- 1 ) I J__ £ u(n)u(n-2) 



10 n = 3 



10 n = 3 



SYMMETRIC 



I 10 n = 3 



"t ^ 

| 1 0 2 I 1 0 

|1 V u ( n ) II V u ( n ) u ( n- 1 ) 

I 10 n = 2 ' 

-+ 



I 1 0 n = 2 

-I 

I 10 2 

1 1 V u(n) 

I 1 0 n = 1 



— 


1 0 


• 






1 


y u ( n ) y ( n ) 






1 0 


n = 3 










10 








1 


V u ( n- 


1 ) y ( n ) 






1 0 


n = 3 










10 










E u(n_ 


2 ) y ( n ) 




(3-31 ) 


1 0 


n = 3 








square matrix in Eq . 


{3.31} 


has a different 



set of summation limits along each diagonal parallel to the 
main diagonal . 
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Case 3: Autocorrelation method 



Here = S=1, = T+m= 1 0+2= 1 2 , and the data matrix becomes. 



u( 1 ) 


0 


0 


u ( 2 ) 


u ( 1) 


0 


u ( 3 ) 


u ( 2 ) 


u ( 1 ) 


u ( 4 ) 


u ( 3 ) 


u ( 2 ) 


u ( 5 ) 


u ( 4 ) 


u( 3) 


u ( 6 ) 


u ( 5 ) 


u ( 4 ) 


u ( 7 ) 


u ( 6 ) 


u ( 5 ) 


u ( 8 ) 


u ( 7 ) 


u ( 6 ) 


u ( 9 ) 


u ( 8 ) 


u ( 7 ) 


u( 10) 


u ( 9 ) 


u ( 8 ) 


0 


u( 10) 


u ( 9 ) 


0 


0 


u( 10) 



The solution of the normal equations {3*26} is now given by; 











10 2 I 


1 0 


1 


1 0 ' — 


9 i-o 


(0) 




1 


V u(n) 1 1 


y u ( n ) u ( n- 1 ) 


1 1 


y u ( n ) u ( n-2 ) 


X , u 






12 


n = 1 | 1 2 


n = 2 


! 12 


n = *3 










h 

1 


_L,° 2 


1 

1 

1 


1 0 


®l-0 


( 1 ) 


= 






Z u(n) 


■ 

1 J_ 


y u ( n ) u ( n- 1 ) 


X > u 








1 12 


n = 1 


1 12 


n = 2 






























1 

> 


10 2 


9.. n 


(2) 






SYMMETRIC 




1 


y u(nr 


1> 0 













! 12 


L,. 

n= 1 



“ 


10 




1 


z 


u ( n ) y ( n ) 


1 2 


n = 1 






10 




1 


z 


u ( n- 1 ) y ( n ) 


12 


n = 2 






1 0 




1 


Z, 


u ( n-2 ) y ( n ) 


1 2 


nr 3 





t3.33) 

Note that the square matrix in Eq. {3.33} is symmetric, 
Toeplitz (equal values along every diagonal parallel to the 
main diagonal), and the summation limits are all the same 
along any diagonal parallel to the main diagonal. 
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The particular structure of the symmetric Toeplitz 
matrix in Eq. { 3 ♦ 3 3 } was developed here strictly from a 
consideration of the error minimization limits. The 
literature contains numerous references to least squares 
matrices with this special structure, but it is usually just 
stated or developed along different lines^. After 
presentation of the fourth case, we will discuss the 
implications of each. 



3 For example, Baheti [Ref. 23 and 24], Hsia [Ref. 14], 
and Iserman [Ref. 133 all utilize what they call 
"correlation analysis" where they assume that the input and 
output sequences are ergodic, such that this special 
Toeplitz structure results. This "ergodic assumption" can 
be described mathematically as follows [Ref. 14, pp. 44]. 
Consider a finite length d i sc re t e- 1 ime sequence of 
measurements of some signal denoted by (s(n)}. If this is a 
representative sample of an ergodic process, then the 
following condition will hold. The value obtained from the 
expression; 

i + N 

1 y s ( n ) s ( n- j ) 

N + 1 n = i 

is invariant with respect to the integer i. If this special 
condition holds, or is assumed, then the Toeplitz structure 
of Eq . {3.33} will result because of the relationships; 

i+N i+N i+N+1 

1 y s(n)s(n-j) = 1 y s ( n+ 1 ) s ( n+ 1 - j ) = 1 V s(n)s(n-j) 

N+1 n = i N+1 n = i N+1 n = 1 + 1 

i+N i+N+2 

= 1 y s ( n+2 ) s ( n+2- j ) - 1 £ s(n)s(n-j) 

N+1 n=i N+1 ri= i +2 
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Case 4 : 



Covariance method 



1+2=3 . 


n 3 = 


O 

II 

H 


u ( 3 ) 


u ( 2 ) 


u ( 1 ) 


u ( 4 ) 


u ( 3 ) 


u ( 2 ) 


u ( 5 ) 


u ( 4 ) 


u( 3) 


u ( 6 ) 


u ( 5 ) 


u ( 4 ) 


u ( 7 ) 


u ( 6 ) 


u ( 5 ) 


u ( 8 ) 


u ( 7 ) 


u ( 6 ) 


u ( 9 ) 


u ( 8 ) 


u ( 7 ) 


u( 10) 


u ( 9 ) 


u ( 8 ) 



(3-34} 






10 




1 


Z u(n 


) y ( n ) 


8 


n=-3 






10 




1 


t 


-1 )y(n) 


8 


n = 3 






10 




* 


Z u(n 


-2) y( n ) 


^3 


n = 3 





13-35} 

Mote that the square matrix in Eq . { 3 * 35} is symmetric but 

not Toeplitz, and the summation limits are all the same. 

The main reason for the preceding four-case development 
is to point out the specific condition under which the least 
squares matrix becomes Toeplitz. This property is exploited 
in Levinson's algorithm [Ref. 1], which solves the normal 
equations with order of complexity proportional to the size 
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of the least squares matrix R, rather than proportional to 
the cube of the size of this matrix as occurs in the other 
three cases shown. Details of this algorithm are discussed 
later in Chapter V and Appendix B. For models other than 
simple moving average, other researchers [Ref. 3-9 and 21 
- 25] constrained their model forms and used the 
Autocorrelation method to form least squares matrices with 
Toeplitz principle submatrices, and therefore gain some 
computational advantage when solving these equations using 
variations of Levinson's Algorithm. This chapter proves 
that this technique is cumbersome, unnecessary, and more 
importantly, generally produces inferior models compared to 
those obtained by the Covariance method. 

The constrained Autocorrelation method models are 
inferior in two main ways: (1) Only specifically related 

sets of model terms necessary for the special Toeplitz 
structure are allowed in the model. This limits model 
growth flexibility, increases the computational burden, and 
degrades the model performance. Further discussion on these 
points is given in Chapter V. (2) The particular choice of 
data interval described by the Autocorrelation least squares 
method (or its statistical equivalent) typically produces a 
model with significantly higher fitting error, and 
substantially larger coefficient error than the Covariance 
method. This key point is discussed in more detail in the 
next two sections. 
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B. A THEOREM DESCRIBING THE CONDITION FOR SUPERIOR 

PERFORMANCE OF THE COVARIANCE METHOD 

The four previous methods use exactly the same form of 
computation; they differ only in the specific data 
measurements used. The Prewindowed, Postwindowed , and 
Autocorrelation methods supply missing zeros, either before 
or after the measured data, or both. Thus these methods are 
arithmetically equivalent to the Covariance method operating 
on a discontinuous function, and it is well known that it is 
hard for least squares or any other minimization method to 
handle discontinuous functions. An alternate explanation is 
that the first three methods utilize constraints on the data 
values, and it is generally found that a constrained 
solution is inferior to the optimum (minimum valued) 
solution. This suggests that the Prewindowed, Postwindowed, 
and Autocorrelation methods would be inferior to the 
Covariance method. 

Simple computer simulation experiments given in the next 
section confirm this reasoning. It remains, therefore, to 
mathematically express this feeling and these results that 
supplying missing data by a run of zeros is a poor method to 
use. We are, of course, concerned with the quality of the 
fit, the sum of the squares of the residuals. 

The first step is to examine under what circumstances 
the Prewindowed, Postwindowed, or Autocorrelation methods 
would produce a lower average error than the Covariance 
method. Some mathematical notation is needed. 
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Consider a finite set of dynamic input observations 
{ u ( n ) ; S< = n < = T } and corresponding output observations 
{ y ( n ) ; S < = n < =T } of some system, and a linear-in-the- 
coefficients model equation relating the present value of 
y(n) to functions of past values of y(n) and present and 
past values of u(n). Denote the integer m as the maximum 
discrete lag (order) of term of the model equation, and Q as 
the coefficient vector. The model equation can be written: 

y ( n ) =f [£, u ( n-i ) , y (n-j ) ; i = 0 , 1 , 2 m ; j s 1 , 2 ra] +e(n) (3-36} 

2 

Let (e^(n)} represent the error residual and 
represent the average squared error obtained when a least 
squares minimization is performed over the interval (n^.n^). 



l-i- I 



n . 

r 

nfn . 



e (n) 



where n^ = S + m 
and n ^ = T 

and + 1 



13.37) 

(3-38} 

{3-39} 

{3.40} 



Let the length of the error minimization interval be 

increased by ‘a small amount N 2 >0 to a larger region (n^n ) 

where n,<n„ and/or n >n,. This new region includes the 
1 2 4 j 

first region and available data points on either or both 
sides of the first region. Missing data points in the new 
data matrix X are padded with zero values. Using the same 
model form, perform a least squares minimization over the 
interval (n ,n ). Denote the new error residual as {e (n)} 



and the average least squares error as J ^ . 
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V N 2 



V 3 / ^ 2 

I e 2 (n) 

n = n 2 



r n 2 - 1 

- — ! — L 

N 1+ N 2 L n =n 



, v2 £3 

e 2 ( n ) + y e ' 



1 n = n 2 



where N 2 = (n,-n^+ 1 ) - N ^ 



( n ) 2 + £ 4 e 2 („) 2 ] 13.11] 

n = n„ + 1 -* 



{3.42} 



Since is the least squares fit over (n .n^) , it must 
be less than or equal to the quantity; 

2 



n 3 

1 V e ( n ) 
N 1 n=n 2 1 



{3.43J 



Let E be the nonnegative value representing this loss of fit 

n 



- L [ 

N„ run> 



e 2 ( n ) - e^n) J 



{3.44} 



THEOREM 1: 



A necessary and sufficient condition for 

2 2 
J 2 < J 1 

is that the following condition must be met: 
n 0 - 1 



{3.45} 



f [" 2 £ V"> 2 * t 



2 

PROOF : 



1 n=n 3+l 



r . 2 1 , . v 3 / x 2 r N i ■ 

(n) < j_) e 2 ( n ) ~ + 1 

J N^n=n 2 Ln 2 J 



E 

{3.46} 



Substituting {3-37J, {3-41}, and {3-44} into {3-45} yields: 

rn, - 1 



i * n 2v t % 2 r* 3 t £ V 4 ( \ 2 "L i T' 3 

1 2 - + A e ? (n) + > e (n) < 1 ) 

N L + N 2 L n = n L 1 n = n 2 n = n 3 +1^ J n = n 



3 2 2 

e 2 (n) - E 

{3.47} 



Multiply by N i + N ^ and transpose the middle term from the left 



n „-1 9 n 9 f N.+N 1 n_ 7 r -i 

Z * 2 (") . Z 4 »,(■>) < 1 - > Z e 2 <n) * [Vz] E 

n = n^ n=n„+1 l N., J nm, 



2 

{3.48} 
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Dividing both sides by N yields our desired condition. 




It is logical to ask how the condition of Eq . {3.46} 

could arise. The condition states that the average fit over 

the added end regions must be less than the average fit over 

the middle region minus the last term on the right side. 

Since the last terra on the right side of Eq . (3.46} 

would be significant, and the error of the end regions must 

be much smaller than the average error over (n ,n^) for Eq . 

{3*^6} to hold. Two obvious special cases can arise that 

satisfy the condition of Eq . {3-46}. 

[1] In the case where the forced zero-valued data 

points in the expanded region correspond to the actual 

input sequence and the natural system dynamics (and no 

2 

noise), then it follows that E in {3.44} will be zero. 
As an example, consider a stable system excited by the 
following waveform {u(n)}. 




The output {y(n)} might have a similar shape. 
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In the preceding figure, the expanded data region 

contains actual zero-valued input-output values, e^(n) 

2 2 

will be zero in the expanded regions, and <= J^. 

[2] If the first region (n^.n^) contains data values that 
don't exactly fit the model equation, and the additional 
measurements in the larger region (n ,n^) happened to 
contain data that exactly, or almost exactly fit the 
model equation, then the average error over the larger 
region could be lower. 

Both of these special cases are possible, but it appears 
highly unlikely that either will occur in practice. The 
special requirements on the data sequences for these cases 
are examples of pathological situations. The probability of 
their occurrence is so small as not to be meaningful. 

The value of Theorem 1 resides not in the elegance of a 
mathematical proof, but because its proof is so simple and 
its meaning so important. Theorem 1 basically proves that 
any least squares error minimization method other than the 
Covariance method, will produce a higher average fitting 
error in all but unlikely pathological cases. Therefore, 
any systems characterization or parameter estimation 
technique based on a least squares minimization method 
different than the Covariance method (e.g. Prewindowed, 

Postwi ndowed , Autocorrelation, etc) will generally produce 
suboptimal fitting error results. This result is important 
for the work that follows, since it is well known that 
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approximations made early in certain recursive algorithms 
often grow and lead to significant errors later on, and we 
want to use a recursive algorithm to efficiently evaluate 
the model growth. 

The next section provides some computer simulated 
experimental verification of the results of this section. 

Other factors affecting the accuracy of systems 
characterization and parameter estimation are also examined. 

C. SIMULATION EXPERIMENTS 

Experiment 1 : 

DESCRIPTION: An investigation of the effects of various 

least squares error methods, and the length of the observed 
data [S<=n<=T], on the accuracy of the characterization of 
known typical linear and nonlinear systems. 

CRITERION: Square root of the average sum squared fitting 

2 

error-, J. Note that we minimize J but examine J. This is 
done for clarity of graphical presentation. 

For the first part of the experiment, we synthesize the 
MAC 3 ) system ; 

y(n) = I.Ou(n) + .8u(n-1) + .6u(n-2) - .3u(n-3) {3-^9} 

The following Test Procedure is used repeatedly. 

TEST PROCEDURE: 

Generate a random sequence for {u(n)l, uniformly 
distributed between the amplitude limits [-5.5]. and start 
this input through the system (with zero initial conditions) 
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I 



at discrete time n=1. Record the observations 



{ u ( n ) ; S < = n < = T } and { y ( n ) ; S< = n < = T} for S and T specified 
below, and use them to minimize the least square equation 
error of {3*12}. Examine the use of the Prewindowed (P), 
Postwindowed (W), Autocorrelation (A), and Covariance (C) 
methods. The value of S is chosen to be 11, and T varies 
from 50 to 1000 in steps of 50. The experiment is carried 
out over an ensemble of ten ( 10 ) runs, with different, but 
equivalently distributed, random input sequences. For the 
data obtained from the ensemble of ten runs, plot the 
maximum, minimum, and average value of J, as a function of 
the value of T and of the choice of minimization method. 

For the second part of the experiment, synthesize the 
following ARMA(2,2) system and repeat the test procedure. 
y(n) = 1.0u(n) + .SuCn-^l) + .6u(n-2) - .9y(n-1) - .7y(n-2) {3-50} 

For the third part of the experiment, synthesize the 
following BVM system and repeat the test procedure. 
y(n) = 1 . 0 u(n) + , 8 u(n- 1 ) + , 6 u(n- 2 ) -. 9 y(n- 1 ) - , 7 y(n- 2 ) 

+ . 2 u(n)u(n) + . 1 5u ( n- 1 ) u ( n-4 ) - . 3 y ( n- 2 ) y ( n-4 ) 

- .16 u ( n- 1 ) y ( n- 1 ) + . 05 u( n -2 ) y( n-4 ) {3-5U 

Figures 4 through 7 present the maximum, minimum, and 
average values of J versus T and the choice of the least 
squares error minimization method for the MA(3) model of Eq . 
{ 3 . 49 }. As expected, the A, P, and W methods show improved 
performances with increasing T, but even at T=1000, these 
methods are significantly inferior to the C method. 
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J. THE SQUARE ROOT OF THE FITTING ERROR CRITERION 




T, THE UPPER INDEX LIMIT OF THE MEASUREMENT DATA 

Figure 4: PREWINDOWED ANALYSIS OF A MA(3) SYSTEM 

5 7 



THE SQUARE ROOT OF THE FITTING ERROR CRITERION 




T, THE UPPER INDEX LIMIT OF THE MEASUREMENT DATA 

Figure 5: POSTWINDOWED ANALYSIS OF A MAO) SYSTEM 
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THE SQUARE ROOT OF THE FITTING ERROR CRITERION 




T, THE UPPER INDEX LIMIT OF THE MEASUREMENT DATA 

Figure 6: AUTOCORRELATION ANALYSIS OF A MA(3) SYSTEM 
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, THE SQUARE ROOT OF THE FITTING ERROR CRITERION 




T, THE UPPER INDEX LIMIT OF THE MEASUREMENT DATA 

Figure 7: COVARIANCE ANALYSIS OF A MA(3) SYSTEM 

6 0 



It is conjectured that the slight increase in the 
average value of J as T increases with the Covariance 
method, is due to the finite precision of the computer used 
for these experiments. The experiments could be repeated 
using double precision variables in an attempt to verify 
this conjecture. We are actually approximating the N 
equations X£ = y. for the 4 unknowns _Q. Since there are many 
more measurement equations than constraint equations, it is 
natural that the average error should be slightly higher as 
T (and therefore N ) gets larger. 

Figure 8 shows how the choice of the four error 
minimization methods affect the matrices and vectors 
involved in the evaluation of the MA(3) model, for different 
values of T. Note that the R matrix and £ vector have been 
normalized by dividing each element by the first-row, first- 
column entry of R\ This does not affect the answer and it 
provides for an easier comparison of the twelve cases shown. 

Since the Covariance method uses only the exact data 
measurements, we denote as exact the values of R and £ in 
the Covariance method of Figure 8. The corresponding matrix 
and vector in the other three methods can therefore be 
considered to have errors. The important thing to recognize 
here is that errors in the third decimal place in the values 
of the R matrix in these other methods, translate to more 
significant errors in the inverse of R, and ultimately into 



substantial errors in the estimates of J and 0. 
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Figure 8: CONTENTS OF MATRICES AND VECTORS UNDER VARYING 
CONDITIONS, FOR A MA(3) MODEL AND SYSTEM (Note: Matrix R and 
vector r have been normalized by the first row, first column entry of R.) 



\ 







AUTOCORRELATION methoo 


WITH T-50 






COVARIANCE METHOD WITH 


T-50 




“ . i ooooi+t 


. 375071-1 


. 297R9E-1 


. 2 59301 4-0 * 






"" . 1 OOOOE + 1 


.121001-1 .977921-2 


. 272 751+0 * 






.375071-1 


. 100001+1 


. 37507E-1 


.297091-1 






.121081-1 


. t 0060E 4-1 . 977921-2 


. 272751*0 




* • 












R - 










. 29709E-1 


.375071-1 


. tOOOOl+l 


. 17507E-1 






. 97792E-2 


. 104071-1 . 999061+0 


. 207571-1 






_ . 259301+0 


. 297R9E-1 


. 37507E-1 


. tOOOOl+l _ 






_ . 2 72 7 51+0 


.276211-1 .207571-1 


.1021 41+1 _ 




I 1 - 


£ . 974061+0 


. 033041+0 


. 62 6721+0 


. 060021-3 J 




1 T - 


£ . 913 7 IE 4-0 


.014*91+0 . 61 1 3 0E -40 


. 91 391 1-1 J 






“ . t0735E+l 


-. 312561-1 


2 041 IE -1 


276661+0 * 






™ . 107061+1 


-.503231-2 -.452491-2 


-. 207031+0* 






-. M256E-t 


. 1 00311+1 


359271-1 


204 ME-1 




R 1 - 


50434E-2 


.994931+0 -.906741-2 


-.253631-1 






-. 204311-1 


-.1392 71-1 


. 1 00 3 l E + l 


31256E-1 






4 5 2 4 9E -2 


-.906741-2 .100151+1 


-. IR0791-1 






^ - 276661+0 


20431E-1 


31256E-1 


. 107351-1 _ 






2 07031+0 


-.253611-1 -. lR879E-t 


. 105 711 + 1 _ 




J 


- .7109371- 


Condition Nutber • 


. t 7261 RE +1 


J 


- .111 7851- 


5 Condition Rusher • 


. 1 747 741 + 1 


•-M 


. 1 006 561 + 1 


. 7034661+0 


. 5707991+0 


-.1051 701+0 


] 


•-M 


. 1000001+1 


.0000001+0 .6000001+0 


-. 300000E+0 J 






U1T0C0RRELATI0N RETROD 


WITH T-250 




COVARIANCE HETHOO WITH 


T-2 50 


























. tOOOOE+l 


. 5 3 7 2 9r - 2 


. 96696E-1 


. 9R161E-1 






— . iooooe+i 


.107951-2 .945621-1 


. 9 902 01 -2 * 






. 53729E-2 


. tooooi+i 


. 53729E-2 


. 966961-1 






. 1 R795E-2 


.992091+0 -.417121-2 


. 900531-1 




R - 


. 966961-t 


. 53729E-2 


. i ooooi+i 


. 53729E-2 




R - 


. 94 562E -t 


-.417121-2 .907401+0 


-. 71 3511-1 








. 96696E-1 


. 51729E-2 


. t OOOOE ♦ 1 _ 






_ . 9902 OE - 2 


.900531-1 -.713511-2 


. 906471+0 




r T - 


£ .105991+1 


. 7 6 06 81+0 


.607431+0 


-. 2t6l *1+0 J 




I T - 


£ . 105531+1 


.765791+0 .605R6E+O 


-. 21 7641+0 J 




“ . 1 00951+1 


-.402021-2 


97549E-I 


901 64E-2 ” 






"" . 100931+1 


-.133021-2 -.967291-1 


-. 107071-1 * 






-.402021-2 


. 100951+1 


451021-2 


-. 97549E-1 






-. 133R2E-2 


.101661+1 .374591-2 


-. 935841-t 




R- 


-.975491-1 


45102E-2 


. iooooe+i 


402R2E-2 




R 1 • 


96729E-1 


. 374591-2 . 1 02201-ft 


. 001 0tl-2 






-. 901641-2 


97549E-1 


402R2E-2 


. 100951+1 __ 






__ . 1 0707E-1 


-.935041-1 .801011-2 


. 10225E+1 _ 
























J 


- .4077471- 


l Condition Rusher “ 


.1231 5 71 + 1 


J 


- . 1412191- 


5 Condition Rusher - 


. 12194U+1 


•-*[ 


. tooiRip+i 


. 7 R96 ROE -Hi 


. 5079501+0 


3055611+0 J 




. 1 000001+1 


.8000001+0 .6000001+0 


-. 3000001+0 J 






AUTOCORRELATION METflOO 


WITH T • t 00 0 






COVARIANCE METHOD WTTH 


T-l 000 




"" . i oonoi+i 


21069E-1 


. 1 1 25 5E-1 


-. 6 355 7E-1 * 






” . 1 OOOOE 4-1 


-.219451-1 .106081-1 


-. 636591-1 — 






-.210691-1 


. 100001+1 


21069E-1 


. 1 l 2 55E-1 






-. 21945E-1 


.100061+1 -.216251-1 


. 1 1 4 751-1 




R - 


. 1 12551-1 


2t069f-l 


. tOOOOl+l 


21069E-1 




R - 


. 10608E-1 


-.216251-1 .100071+1 


-.208001-1 






_-. 635571-t 


. 1 12551-1 


210641-1 


. tOOOOl+l _ 






-.63659E-1 


.114751-1 -.208801-1 


. 1 0007E+1 




l t - 


£ . 1 00 91 E ♦ 1 


. 76 2 7 RE *0 


. 6005H+0 


366621+0 J 




r A - 


£ . 100791+1 


.762091+0 .599991+0 


-.3672tl+oJ 






“ . 100461+1 


. 20250E-1 


95433E-2 


. 63 41 RE -t “ 






” . 10046E+1 


.2UUE-1 -.086891-2 


. 63401 l-l ““ 




R _l - 


. 202501-1 


. 100101+1 


. 2066OE-1 


-. 954 11E-1 




R' 1 - 


.211141-1 


.100051+1 .211941-1 


-. 968771-1 






-. 9543'l-2 


. 206601-1 


. 100101+1 


. 20250E-t 






- . 006091-2 


. 21 19&E-1 .100031+1 


. 200641 -1 






.6341 «1-1 


954HE-2 


. 202501-1 


. 100461+1 






. 634R1E-1 


-.960771-2 .200641-1 


. 10039E+1 


























J 


- . 1 04 3941 - 


2 Condition Rusher - 


. 1157031+1 


J 


• .2214291- 


5 Condition Nusber - 


. 1 l 57021 + 1 


t r 'l 


£. 10001 61 + 1 


. 7990621+0 


.5997921+0 -.2994221+0 


] 


“- 1 *! 


. 1000001+1 


. 800000E +0 .6000001+0 


-. 3000001+0 J 



Figure 8: (CONTINUED) 



While we are interested in determining the least squares 
method that provides the minimum J, we have also evaluated 
the typical offset in the coefficient estimates that result 
from the use of these four methods, and present this 
information in Figures 9 through 12. The erratic behavior 
of the A method in estimating the coefficient values appears 
to be the result of the padded zeros on both ends of the 
data matrix X. The appearance of similar curves in Figures 
9 through 12 for the A and W cases can be explained as 
follows. Both the Autocorrelation (A) and Postwindowed (W) 
cases have padded zeros at the bottom end of the matrix X 
given by Eq . {3-30} and £3-32} respectively. The effect of 

the padded zeros in the Autocorrelation case X matrix 
decreases as N gets large, and the A and W cases approach 

each other in this limit. 

* 
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Figure 9: COEFFICIENT ESTIMATION OF A MAO) SYSTEM^ 
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Figure 10: COEFFICIENT ESTIMATION OF A MA(3) SYSTEM 
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Figure 11: COEFFICIENT ESTIMATION OF A MA<3) SYSTEM 
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Figure 12: COEFFICIENT ESTIMATION OF A MAO) SYSTEM 
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♦1% ERROR LIMITS 




Figures 4 through 12 show the significant differences 
resulting from the choice of error minimization method. 

This choice directly affects the contents of R and r, which 
in turn affects 0 and therefore J. A logical conjecture is 
that the condition number (ratio of largest to smallest 
eigenvalue) of the matrix R, could be a good indicator of 
the quality of the least squares fit. In other words, the 
more well conditioned the matrix (lower condition number), 
the lower the corresponding fitting error J. While 
esthetically pleasing, this conjecture is not born out by 
experience. In over 30 cases of linear and nonlinear 
systems modeled using each of these four error minimization 
methods, the corresponding R matrices were all well 
conditioned (low condition number), there was no significant 
difference in condition number between the four methods, and 
there was no direct correlation between lowest condition 
numbers and lowest fitting error J. Condition number data 
is included in the typical results of Figure 8. 

This is explained by the following. The fit J is a 
function of the entire coefficient vector Q and the vector 
r. Since the coefficient vector £ is a function of both the 
matrix R (actually the inverse of R) and the vector £, the 
condition number of R is an insufficient measure of the 
accuracy of 9, and therefore is an insufficient measure of 
the quality of the fit J. 
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Figures 13 through 16 present the results of the 
experiment for the ARMA(2,2) model of Eq . {3.50}. Figures 

17 through 20 present the results of the experiment for the 
BVM(2,4) model of Eq. {3*51}. Both of these sets of figures 
indicate the superior performance of the Covariance (C) 
method in minimizing the fitting error criterion J. 
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J, THE SQUARE ROOT OF THE FITTING ERROR CRITERION 
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Figure 13: PREWINDOWED ANALYSIS OF AN ARMA(2,2) SYSTEM 
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Figure 14: POSTWINDOWED ANALYSIS OF AN ARMA(2,2) SYSTEM 
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Figure 15: AUTOCORRELATION ANALYSIS OF AN ARMA(2,2) SYSTEM 
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Figure 16: COVARIANCE ANALYSIS OF AN ARMA(2,2) SYSTEM 

74 



, THE SQUARE ROOT OF THE FITTING ERROR CRITERION 




T, THE UPPER INDEX LIMIT OF THE MEASUREMENT DATA 
Figure 17: PREWINDOWED ANALYSIS OF A BVM(2,4) SYSTEM 
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Figure 18: POSTWINDOWED ANALYSIS OF A BVM<2,4) SYSTEM 
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Figure 19: AUTOCORRELATION ANALYSIS OF A BVM<2,4) SYSTEM 
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J, THE SQUARE ROOT OF THE FITTING ERROR CRITERION 
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Figure 20: COVARIANCE ANALYSIS OF A BVM<2,4) SYSTEM 
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This experiment has lead to a greater understanding of 
the accuracy of the different minimization techniques with 
respect to the size of the observation sequences for a 
nonrecursive model, a recursive model, and a nonlinear model 
of the BVM form. For the models examined, the Covariance 
least squares error minimization method is superior to the 
Prewindowed, Postwindowed , and Autocorrelation methods. 

These results are representative of those obtained with 
other system equations. The conclusion to be drawn from 
simulation Experiment 1 is that the Covariance method is the 
most accurate of the four methods. Since our primary 
problem is accurately characterizing systems whose exact 
mathematical form is unknown, the Covariance method is 
adopted for the rest of the work in this thesis. This 
avoids introducing offset errors in J and £ that might give 
misleading results later in our model growth techniques. 

The next section examines another factor that can affect 
2 

the estimates of J and Q; output measurement noise. This 
is included for convenience at the present time, and is 
referred to again later on. 
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D. EFFECTS OF OUTPUT MEASUREMENT NOISE 

If the system output is contaminated with additive noise 

{v(n)l, then the evaluation of the model and the estimates 

4 

of the model coefficients may be affected . If the additive 
noise is uncorrelated with the system input, and the model 
is nonrecursive (e.g. no ©g.q or 9p;q terms in a BVM which 
thereby reduces to a MA or VOL model), then the effect of 
the additive noise on the coefficient estimates will 
generally be small. This effect approaches zero in the 
limit as the size of the data segment (T-S + 1 ) gets large. 
This property of a nonrecursive model is well known in the 
literature [Ref. 13 and Ref. 14, pp 41 and pp 144]. 

To better understand the effect of uncorrelated additive 
output noise in the case of a linear recursive model , 
denote the noisy output sequence as f z ( n ) ^ ; 

z(n) = y(n) + v(n) for all S<=n<=T 13*52} 

To utilize the equation error minimization techniques 
discussed at the beginning of this chapter, we substitute 
z(n) for y(n) in the evaluation equations, and Eq. {3-5} 
becomes ; 

z(n) = 3 T x(n) + e(n) {3-53} 



4 Other measurement noises such as additive input noise 
or multiplicative input and/or output noise are also 
possible, but are not considered. 

5 The following analysis holds for linear recursive or 
nonrecursive models (e.g. ARMA). There appears no tractable 
way to extend it to recursive nonlinear models ( e . g . 3VM) . 



SO 



where under the condition of additive output noise, 9^ is the 
coefficient vector, and _x(n) is the corresponding term 
vector based on u(n) and z(n), instead of u(n) and y(n). 

The is used to denote factors affected by the noise. 



_x( n) 


{ y( n) } = 


{ z( n) ] 








x( n) 


{ y ( n ) } = 


{ y( n) } 


- x(n) 


- 


u( n) ) = 0 

y ( n ) | = { V ( n ) ! 


_x( n) 


+ 








{3.54] 



Note that if the model is nonrecursive, _x v (n) = (3, 

and x(n) - _x(n). A more interesting example is as 

follows; If _x(n) = [ u( n) , u(n-1 ) , y( n-1 ) ], then 

_x(n) T = [ u( n) ,u( n-1 ) , z( n-1)] = [ u( n) , u( n-1 ) , y( n-1 ) + v( n-1 ) ] 

T 

and _x v (n) 3 [ 0,0,v(n-l) ] 

Substituting Eq . {3-53} and Eq. {3.54! into Eq. {3*9}, 

A 

and minimizing with respect to £ by matrix calculus, yields 
the least squares solution equations; 



r ~T * i * 

1 [ X X ] 9 = 


* T 

1 X z 




{ 3 . 55 } 


N 


N 






where _z^ = [zCn^) 


, z ( n 2 + 1 ) , • • • 


, z( n 3 ) ] 


{ 3 . 56 } 


and 








X T = [ £( n ^ ) 


_x( n 2 + 1 ) 


. x(nj) ] 


{ 3 . 57 ! 


Substituting Sq . 


{ 3 • 54 ] and 


{3.57} into {3.55} yields; 




[ R + R v ] £ = 


£ + £v 




{3.58} 


where the following matrices 


and vectors are defined; 




R = 1 X T X 






{3. 59* 



N 
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{3.60} 



N 



x T - 


[ £( n 2 ) x( n 2 +1 ) ••• _x( n 3 ) 1 


15.61 ] 


T 

X v = 


C £ v ( n 2 ) l v ( n 2 + 1 ) •** 3 


15*62] 


r = 


T 

1 X y 


15*65] 




N 




£v 


T 

= 1 X v v 


15*64] 




N 




T 

v = 


[ v ( n 2 ) » v ( n £ ♦ 1 ) » . • • v ( n 3 ) ] 


15.65] 


Solving 


{3* 58] for the model coefficient vector 9; 






- ]_ 




9 = 


[ R + R v ] [ r ♦ r v ] 


{5.66} 



The first term on the right side of Eq . { 3 • 6 6} can be 

simplified when the inverse exists. 



C * + R v 


r 1 ■ [ s [ 1 


+ r" 1 r v 33 






= [i + R 


"'rJ'V 1 


15-67] 


Substituting 


15-67] into {5 


.66} and using Q=R from 




Eq * 1 5 * 1 9 ] yields ; 

~ r " 1 1" 1 -1 r 

9 = [ I + R R v ] R [ 


£ + £ v -’ 




= [ I + 


R L R v ] 1 R~ 1 r 


+ [ I + R _1 R V ] R _1 £ v 




= [ I + 


-1 i~ i 

R R v ] i + 


r ”1 m ”” 1 — 2. 

[ I + R R v ] R £ v 


15.68] 



Note that when the measurement noise { v ( n) ; S < = n< T ] is equal 
to zero, R v reduces to the null matrix, _r v reduces to the 



null vector, and Eq. { 3 • 6 8 } yields 9 _ - 9_. We are interested 
in the noisy measurement case, and denote the distortion in 
the model coefficients as 9^, where 



id 



9 - 



9 



15 . 69 ] 



Substituting Eq . 15*69] into Eq . 13*68] yields an expression 

for this distortion in the model coefficients. 
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£ d = [ I + R~ 1 R V ] L 9 + [It- R _ 1 R v ] R’ 1 r v - 

= [ I + R _1 R V ]" 1 [ 9 + R' 1 r v ]’ 1 - 9 {3-70} 

Eq . (3-70} gives an exact expression for the coefficient 
distortion due to additive output noise, but its meaning is 
hard to appreciate directly because of the four inversions. 

The first term on the right side can be expanded in a 
geometric series; 

-1 -1 -i -1 +2 -i +3 

[ I + R X R V ] = I - R X R V + [ R R v ] - [ R R y ] + ... {3-71} 

This series is valid when the absolute values of the 

eigenvalues of matrix [ R ^R v 3 are all less than 1. Matrix 

powers greater than one are negligible when the eigenvalues 

are small compared to one. These conditions are met when 

the total power of the additive noise is small in comparison 

to the total power of the system output; i.e. high SNR. 

Using this assumption, Eq . {3-71} is approximated by the 

first two terms of the expansion, and Eq . (3-70} becomes; 

£ d = [ I - R’ 1 R V 3 C £ + R 1 r v 3 - 9 

= £ - r_1r v £ + R ~ l L v ~ r ~ 1r v r " 1 £v * £ 

= R ” 1 [ [ I - R v R _1 ]r v - R 7 0] (3-72} 

The above equation can be interpreted as describing the 
model coefficient distortion vector as composed of the 
difference of two vectors. One is a constant term, and the 
other is a multiplicative function of the noise-free model 
coefficients. Note that the only inversion needed for this 
approximation is that of the matrix R. Also, both vectors 
are directly proportional to the inverse of the matrix R, 
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which is independent of the particular additive noise 
characteri sties . 

The distortion on the estimates of the model coefficients 
is therefore the difference of two vector; 

2c = R_1 I >v - R v R ~ 1 Lv ] = CI - r ~ 1r v ] r-1 J1v {3-73} 



and 



^ m = R R v 2 



where Q , = 9 - 0 _ 

— d — C — a 



(3-74} 
{3-75} 

This shows how the coefficient distortion of a linear 

recursive model depends upon the choice of the particular 

model terms, time averages of the system input and output, 

and time averages of the additive output noise. 

As an illustrative example of the effect of noise on a 

linear recursive model, consider the ARMA( 1 , 1 ) model; 

y ( n ) = 9^(0) u ( n ) + 9^(1) u(n-1) + 9^(1) y(n-1) {3-76} 

The following matrices and vectors are written by inspection; 
.T 



x ( n ) 



x ( n ) 

—v 



[u(n) , u(n-1 ) , y ( n — 1 ) ] 
[ 0, 0, v ( n- 1 ) ] 



u ( n 2 ) 



u( n 2 ~1 ) 



u ( n 2 + 1 ) u(n 2 ) 



y ( n 2 - 1 ) 

y ( n 2 ) 



u(n^) u(n -1) y(n^-1) | 



{3.77} 

{3.78} 



{3.79} 



0 

0 



0 

0 



v ( n 2 - 1 ) 
v ( n 2 ) 



v ( n - 1 ) 



{3.80} 



84 



n 2 } 


u ( n + 1 ) ... u(n. 


3 > 


V” 


u ( n 2 ) ... u ( n 


3 -° 


n 2 -1) 


y ( n 2 ) ... y ( n 


3-’> 


1 


£ 3 u ( n ) y ( n ) 




N 


n = n 2 




1 


T 3 u( n- 1 ) y ( n ) 




N 


n = n 2 




1 


V 3 y( n-1 )y( n) 




N 


n = n 2 





y( n 2 ) 

y ( ri 2 + 1 ) 



L£(n 3 ) | 



r 

— v 



r~ o 

o 



0 

0 



o “1 )fv(n 2 ) — I 



v( n -1 ) v( n ) ... v( n -1 ) 

_ 2 2 3 





o — 




0 




n 3 


1 


Y v( n) v( n- 1 ) 


N 


n= n 2 



v( n 2 + 1 ) 



v( n^) 



'u(n ) u(n 2 +1 ) 

u( n-1 ) u(n ) 

y ( n-1 ) y( n 2 ) 



u(n 3 ) 
u(n 0 -1 ) 

y( n 3 - 1 )_j 



n - 



13.81 } 



{3-82} ' 

’u ( n 2 ) u ( n 2 ~1 ) y( n 2 ~1 ) 

u(n 2 +1 ) u ( n 2 ) y(n 2 ) 



.u(n ) u( n^-1 ) y ( n. - 1 ) . 



i y u(n)< 

N n=n 2 



j_ £ 3 u(n)u(n-l) | j_ £ 3 u(n)y(n-l) 



N n = n 



N n = n , 



1 



ft o 9 

1 f 3 u ( n- 1 r 
N n=n 2 



+ 

I 



SYMMETRIC 



_1_ V u( n- 1 ) y ( n- 1 ) 
N n = n 2 

n 3 2 

J_ £ y( n-1 ) 

N n=n 



{3.83} 
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0 

0 



0 

0 



v ( n - 1 ) v(n^) 



0 

0 



v ( n 3 ~1 



0 

0 



0 v( n -1 )' 
0 v(n 2 ) 



0 v ( n 3 — 1 ) 





— 0 


1 

4- 


0 


1 

— h- 


0 — 


= 


0 


1 

1 

i 


0 


1 


0 




0 


1 

1 

1 


0 


ii 


n o 2 

7 3 v( n- 1 ) 2 




— 


1 




1 N 


*1 

N> 

L 



( 3 . 84 ] 






o 

0 



0 

0 



I 



I 

■ H 



0 

0 



n . 



o - j i r 3 v(„-i > 2 

• N n = n 

2 - 



V 0) - 



9 o ; l ^ 1 ^ 



o 

o 



9 0U (,) i £* 

N n= n ^ 



( 3 - 85 ] 



Representing matrix R of Eq . (5-83] in the following shorthand; 



a I b 1 c 

1 

R = b I d I e 

1 I 

c I e | f 

— i I _ 

the inverse of matrix R can-be written as shown below. 



( 3 . 86 ] 





S J h j k 




(df - e 2 ) / | R | | (ce - b f ) / J R | J (be - c d ) / | R | 
(ce - bf ) / | R | j (af - c 2 ) / | R | _l (be - ae)/|R| 


R - 1 - 


h 1 a j q 

1 


= 




k * q 1 s 

L_ l I 1 




(be - cd ) / | R | j (be - ae)/|R| j (ad - b 2 ) / | R | 



where ] R [ = adf + 2 bec 



c^d - b 2 f - e^a 



( 3 . 87 ] 

( 3 . 88 ] 
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Substituting {3*85} and {3-87} into {3-74} yields; 



£m 



Q n ., ( 1 ) 1 7 3 v(n-1) 2 

U > I TT" 

iN n = n^ 



k 

q 

s 



{3.891 



Substituting {3-82}, {3-84}/, and {3.85} into {3-73} yields; 

k ' 



3 c = 1 V v ( n ) v ( n- 1 ) 
N n = n 2 



_H 3 2 

1 - s_ v ( n- 1 ) 

N n = n 2 



q 

s 



{ 3 . 90 } 



Substituting {3.89} and { 3 . 90} into {3.75} provides an 
expression for the distortion of the coefficient vector; 
n , 



-d = 



y v ( n ) v ( n- 1 ) 
N n = n 2 



n . 2I n 3 2 

'-f S v(n - ,) |- 9 o;l (,) f E v(n - ,) f 

N n = n 2 J N n = n I 



k 

q 

s 

{3.91} 



where k, q, and s are elements of the inverse of matrix R. 

We can now directly examine the effect of additive noise 
on the coefficient distortion vector, in terms of the time 
averages of the additive noise. If the additive output 
noise is ergodic and uncorrelated with itself, the first 
term on the right side of Eq. {3*91} is small and will 
approach zero in the limit as N— ♦OO. The magnitude of the 
coefficient distortion values will be directly proportional 
to both the sample autocorrelation of the output noise and 
the value of the recursive coefficient. Under this 
condition the coefficient distortion vector equals the 
negative of Eq. {3-89}. The value of indicated by Eq. 
{3.91} has been observed in simulation experiments. 
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The preceding example serves to demonstrate the new 
insight that is available as a result of the development of 
equations {3*52} through (3.75). The effect of the presence 
and properties of the additive noise on the distortion of 
the model coefficients follows directly from an examination 
of these equations. The impact of this will be addressed 
again in Chapters VI and VII. 

From the preceding development of an expression for the 
distortion in the model coefficients resulting from additive 
output noise, an expression for the related increase in the 
fitting error can also be obtained. Substitute z(n) for 
y(n) in the development of Eq . {3-20}, denote the minimum 

/«* 

error fitting criterion resulting from the noisy data as J, 
and make use of {3*52} through {3-691. 

?2 . ^"*3 , .2 T I * 

J = £ V z(n) - r. £ 

N n = n 2 

T T 

= 1 [ £ + v ] [y + £]-[_r + ] ( £ + £ ] 

N 

= J_ + 2 v_ T y + _1_ - r_ T © - r. T £ , - r v T £ - £ y T £, {3-92} 

N N N d 

* 2 2 

Denote the distortion in the minimum value of J as J . 

a 

2 A 2 2 

J7 = J z - J z (3-931 

a 

Substituting (3-921 into (3-931 and using (3-201 yields; 

~ i. 1*1 + 1 v * v - £. T £ d ~ £ v * © - C.y T £d (3-94} 

d N M 

Using the assumption that the additive noise (v(n)l is 
ergodic and independent of the system output (y(n)l, the 
first term on the right side of (3-941 is small, and 
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approaches zero as N-^OO. Using this common assumption and 

substituting {3-72} into {3.9*0 yields; 

T T - 1 — l T t 

J, = 1 V A V - r R {[ I - R V R i ]r v - R l £} - r i Q 

N 

- r v T R _1 {[I - R v R ~ 1 ] r y - R v £} {3-95} 



From 
Eq . 



T T — 1 

Eq. (3-191 we have £ = r_ L R . Substituting this into 

(3-951 and simplifying gives; 

1 1 T 1 ~ £ T[ £v ~ R v R X £v ~ R v£ + £v ] " £ v T r_1 I v 
N 

T -1 -1 x _1 

+ Iv R R v R £v + £v R R v£ 

1 v T v + 9 T R v 9 - C r v T R _1 + 2£ T ] [ r v - R^’ 1 ^ ] {3-96} 



Equation {3*96} is a new expression for the distortion 
in the fitting error criterion of a linear recursive model 
caused by additive output noise. Mote that if the model had 
been non r ecu r s i ve , vector r v would reduce to the null 
vector, and matrix R v would reduce to the null matrix. In 
this special case, Eq . (3.96} reduces to; 

t n -5 9 

J = l v r v = 1 T] v ( n ) 

d N M n = n 2 { 3 - 97 } 

and the distortion in the fitting error criterion would be 
equal to the average power of the additive output noise, as 
expected . 

The ARMA model of Eq . (3-76} is used again in an 

illustrative example of the effect of noise on the fitting 
error of a linear recursive model. Substituting Eq. {3.76} 
through { 3 * 88 } into {3-96} produces; 
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j 2 . i y 3 v(„, : 

d X n=n 2 



+ s 



Tf— l 



0 I 0 

— I — 

0 | 0 

— I — 

I 

0 I 0 



\-- 

L-. 



0 

0 



— N n = n 



"3 2 
1 £ v ( n- 1 ) 



r • 


. n 


3 


( n 


) v ( n- 1 ) 1 


g J h ] k 

— + --t — 






T 


- I 0 ! 0 


;_L 7 






+ 


2_9 


1 1 


• N n = 


n 2 V 




J 


















h j m j q 




- 














— T-*r— 

k 1 q • s 

__ l • 
























f 








I I 


— 1 








0 






0 


! 0 , 


0 




g 


! h ! k 

1 1 


0 




— 


0 


! 0 | 


0 




h 


"t — — 

| m | q 










-t — » 








■t — *--- 


-S3 


n- 1 ) 






' 1 n, 

1 1 — 3 


2 






J ! 


y v ( n ) v ( 




0 


0 ; 1 v J 


v ( n- 1 ) 




k 


q 1 s 










' N rv=n 


_ 






1 j 


N n = n 2 
1 








2 









0 

0 



J_ 7 v(n)v(n-1) 

N n = n„ 






* I £ j v(„) 2 .[9 01 (lj i £ 3 vCn -1 )' 

Nn = n M 



2 n . 

N n = n. 



1 7 v(n)v(n-1)| [s + 

LF J L 



2 9 0;l (,) 



n 3 

1 - £ 7 v(n-1 ) 

N n = n 



’1 



( 3 . 93 } 



If the noise {v(n)} is ergodic and uncorrelated with 
itself, the factor pr em'’ 1 1 i pi y i n g the third term on the 
right side of Eq. { 3 - 98 } is small, and approaches zero in 

the limit as N »CQ. Using this common assumption, the 

fitting error distortion reduces to the following; 



j d = 1 £ 3 v(n) 2 * [e ;1 (1)] 1 t 3 v< "-' ) 

N n = n 2 N n = n 2 

S [ 1 ♦ [ 9 0;l (,) ] 2 ] i E 3 » (I ’ )2 

L J N n = n „ 



for large N 



13 - 99 } 
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The preceding equation shows that for uncorrelated additive 



output noise, the increase in the fitting error criterion is 

proportional to both the power of the additive noise, and 

one plus the square of the magnitude of the recursive 

coefficient. If the model form had more than one recursive 

2 

term, the resulting equation for J , would appear more 

a 

complex, but would follow a related form as this example. 

The preceding development provides new insight to the 
actual effect of additive output noise on the 
characterization of linear recursive systems. 



IV. EVALUATION OF MODEL EQUATIONS 



A. EXISTING TECHNIQUES 

Given a set of input and output measurements and a model 
equation that is a function of these measurements and linear 
in a set of coefficients, Chapter III showed how to obtain 
estimates for the coefficient values and the error residual. 

The literature reports [Ref. 17] that the ARMA model can 
reasonably represent most linear systems of interest using 
orders less than m = 10. A current problem is the 

efficiency of computation when larger and more general model 
forms like VOL and BVM are considered. Regardless of the 
degree or memory of the model, the calculation of the model 
fit involves solving the normal equations [3-15}. 

This section discusses the traditional direct least 
squares model evaluation technique. The next section 
develops a unified solution technique for the more efficient 
recursive evaluation of a wide class of models. The last 
section compares the computational features of these 
evaluation techniques. 

The traditional modeling technique starts by selecting a 

T 

first model y(n) = Q x(n). We include the index parameter 1 
to identify this first model, and write the prediction form 
equation as follows. 

y(n,1) = ( 1 ) T x. ( n , 1 ) + e(n,1) {4.1} 
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Using {4.1} in place of { 3 . 5 }, and following the same 
least squares development as Chapter III, yields the normal 
equations corresponding to Eq . { 3 . 1 4} . The index parameter 

is included where needed. 

[x( 1 ) T X( 1 )]£( 1 ) = X ( 1 ) T y {4.2} 

N J N 



This leads to the model error evaluation and coefficient 
estimation equations in terras of this indexed notation; 

J 2 (i) = 1 y T y - £( 1 ) T R( 1 ) -1 £( 1 ) {4.3J 

N 

0(1) = R(1) _1 £(1) {4.4} 



where 

R(1) = J_ X(1) T X(1) {4.5} 

N 

T 

andr_(i) = 1X(i)y {4.6} 

N 



2 

If the fitting error J (1) is too large for the 

application, the traditional systems identification 

technique is to select a larger model that contains the 

terms of the first model plus some additional terms. This 

second prediction form model is written as shown below. 

T 

y ( n , 2 ) = 0(2) _x(n,2) + e(n,2) {4.7} 

The technique forms equations like {4.3} and {4.4} for the 

2 

model of {4.7}, and continues until the fit J (i) of model 
number i is within some acceptable limits. This is a brute- 
force and inefficient approach since the evaluation of the 
second (and subsequent models) does not take advantage of 
the solution calculated for the previous model(s). 
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To appreciate the above point, we digress momentarily to 
demonstrate the computational complexity (as measured by the 
number of multiplications or divisions) involved with 
calculating the inverse of the matrix R ( i ) when the model 
form is the BVM(d,m) introduced in Chapter II. Equation 
(2.9) gives the number of coefficients in a BVM as a 
function of the choice of the degree and the memory. Table 
1 shows the number of coefficients, and therefore the size 
of the corresponding R(i) matrix, for any BVM of degree up 
to 6 and memory up to 10. In this chapter, the notation 
c(i) is used for the size of the ( i ) C ^ model regardless of 
its f orm . 





d = 1 


d = 2 


d = 3 


d = 4 


d = 5 


d = 6 


m = 0 


1 


2 


3 


4 


5 


6 


m= 1 


3 


9 


19 


34 


55 


83 


m = 2 


5 


20 


55 


125 


251 


461 


m = 3 


7 


35 


1 19 


329 


791 


1715 


m = 4 


9 


54 


219 


7 1 4 


200 1 


5004 


m = 5 


1 1 


77 


363 


1 364 


4367 


12375 • 


ra = 6 


1 2 


104 


559 


2379 


8567 


27131 


m = 7 


15 


135 


815 


3875 


15503 


54263 


m = 8 


17 


170 


1139 


5984 


26333 


100946 


m = 9 


19 


209 


1539 


8854 


42503 


177099 


m= 1 0 


21 


252 


2023 


1 2649 


65779 


296009 



TABLE 1: Humber of coefficients in a BVM of degree d and memory 



The computational cost of inverting a matrix R(i) of 
size c(i), is of the order of 1/3 times the cube of c(i) 
multiplicative operations. Table 2 shows the approximate 
number of such operations required by the direct least 
squares technique for the inversion of the matrix R(i) 
corresponding to a BVM of degree d and memory m. 





d = 1 


d = 2 


d = 3 


d = 4 


d = 5 


d = 6 


m = 0 


1 


3 


9 


22 


42 


72 


ra= 1 


9 


243 


2287 


13100 


55450 


190600 


m = 2 


42 


2667 


55460 


651000 


5271000 


3 . 266E7 


m = 3 


115 


1 . 429E4 


5.617E5 


1 . 187E7 


1 . 650E8 


1 . 6 8 1 E 9 


m = 4 


243 


5 .249E4 


3 . 50 1E6 


1 . 2 1 3E8 


2.671E9 


3 . 176E10 


ra = 5 


444 


1 . 522E5 


1 . 594E7 


8 . 459E8 


2.776E10 


6. 317E1 1 


ra = 6 


733 


3.750E5 


5.822E7 


4 . 488E9 


2.096E1 1 


6.657E12 


m = 7 


1 125 


8 . 200E5 


1 . 804E8 


1 . 9 40E 1 0 


1 .242E12 


5. 326 E 1 3 


m = 8 


1638 


1 . 6 38 E6 


4 . 925E8 


7. 1 4 3 E 1 0 


6 . 087E1 2 


3 . 429E1 4 


ra = 9 


2287 


3 . 043E6 


1 .215E9 


2 . 3 1 4 E 1 1 


2.560E1 3 


1 .852E15 


ms 10 


3087 


5 . 334E6 


2 . 760E9 


6 .746E1 1 


9 . 487E1 3 


8.646E15 



TABLE 2: Number of multiplication operations required for the 
matrix inversion involved in the direct least squares 
evaluation of a BVM of degree d and memory m. 

It is clear that for degrees above 3. the inversion of 
the matrix R(i) required for this direct least squares 
evaluation of model i, rapidly becomes prohibitively 
expensive for increasing d or m. For problems of interest, 
however, we want to evaluate such higher degree and/or 
memory models of the BVM form. 

It should be mentioned that for large c(i), the 
computation of the elements of the R(i) matrix requires 
approximately c(i)N operations. This can dominate the 
computation time if N >> c(i), as is typically the case in 
the literature. Even though the correct model forms were 
used in the three examples of experiment number 1, the 
Autocorrelation, Prewindowed, and Postwindowed methods still 
required a large N to obtain a small fitting error. On the 
other hand, the Covariance method gave superior performance 
without requiring N >> c(i). These results are typical of 
those obtained with other computer simulated experiments. 
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and indicate that an equivalent performing model solution 
can be obtained more economically with the Covariance 
me tho d . 

B. PRESENTATION OP A RECURSIVE EVALUATION TECHNIQUE 

To develop efficient algorithms for evaluating models of 
the BVM form of equation {2.7} , w_® ma ke a. change in notation 

that will allow us to relate the equations and solutions of 
various models. This notational change is important for 
subsequent developments. We reorder jc(n,i) and 9^(1), 
respectively, in a manner described below. We denote the 



reordered 


_x ( n , i ) as 


w ( n , i ) , and 


the 


reordered 9(i) as jD(i), 


such that 


Eq. {4.7} 


becomes ; 








j(n,2) 


= t>(2) T w( 


n, 2) + e ( 


n,2) 




{4.8} 


where 


T 

w(n, 2) 


[ w(n , 1 ) | 


w( 


n , 2 / 1 ) T ] 


{4.9} 


and 


w( n, 1 ) 


x(n, 1 ) 






{4.10} 


and 


w( n, 2/1 ) is 


a vector formed by starting with x( 


n,2), 




deleting all of the terms 


that also exist in vr( 


n, 1 ) , 




and reducing the size 


of the resultant vector by 




eliminating 


the spaces 


0 f 


any deleted elements. 




and where 


II 

H 

C\l 


£(1/2) T ! 


CM 


{4.11} 


and 


£(1/2) = 


( 1 ) evaluated 


, nd. 

at the 2 iteration 


{4.12} 


and 


j)( 2 / I ) is a 


vector formed 


by starting with 9_( 2 ) 


f 



deleting all of the terms that also exist in jd( 1), 
and reducing the size of the resultant vector by 
eliminating the spaces of any deleted elements. 



96 



It remains to show that the evaluation of model equation 
{4.8} can be accomplished more efficiently than the 
evaluation of the same model given instead by Eq . {4.7}. 

Before demonstrating this result, the preceding notation is 
generalized for models beyond the first and second. 

We recursively define an equation for model i of size 
c(i) in terms of model i-1 of size c(i-l), where 
c(i) > c(i-l). 

T 

y ( n , i ) = J>(i) w(n,i) + e(n,i) {4.13} 

where w(n,i) and jo(i) are size c(i) vectors defined in the 
same manner as Eq . {4.8} through { 4 . 12}, such that, 

w(n,i) T = [ w(n,i-l) T ' w(n,i/i-l) ] {4.14} 



and 

_o(i) T - [ _o(i-l/i) T ! ^(i/i-l) ] {4.15} 

Following the standard least squares development yields 

the normal equations corresponding to equation { 4 . 13}: 

J_j"w( i)^W( i)lj>( i) “ J_ W(i) T jr {4.16} 

2J L J H 



where we have the c(i) x II transposed data matrix; 

W(i) = [ w( n 2 ’ 1 ^ w( n 2 + 1 » i ) ••• v(n^,±) ] {4.17} 

The solution of { 4 . 16} is the coefficient estimation equation; 

{4.18} 



_o( i) - [ W(i) T W(i) ] WCi) 1 ^ 



The solution for the model fitting error criterion is; 

J"(i) = J_ 2 T Z " d^(i) T D(i) 1 _d(i) 

II 



{4.19} 



where the c(i) x c(i) least squares matrix is; 

D(i) = 1 W(i) T W(i) {4.20} 

II 
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14.21 1 



and the c(i) x 1 vector _d(i) is defined; 

d(i) = 1 W(i) X 2 
N 

Instead of solving Eq . {4. 18} and { 4 . 19 }, we use 14.14} 

and {4.15} to develop a set of recursive model evaluation 
and model coefficient estimation equations^. Define q(i) to 
represent the number of terms in the (i)th model that are 
not contained in the (i-l)st model. 
q( i ) * c ( i ) - c ( i- 1 ) {4.22} 



6 The recursive model evaluation equations presented in 
this chapter were independently developed; but they turn out 
to have many similar features with a published algorithm by 
Hsia [Ref. 14, pp 27]. Hsia’ s algorithm is designed for 
recursively computing the least squares coefficient 
estimates of a model as the number of terms in the model 
increases. The equations developed in this chapter have 
equivalent recursive capabilities for coefficient 
estimation, and also include some features that are used 
later in the thesis for the more general model growth 
problem. 

The form of our equations turn out to be somewhat 
different since we have some matrices and vectors that do 
not appear explicitly in Hsia' s algorithm. These matrices 
and vectors are basic to the development of the main model 
growth results of Chapter VI. We also include a recursive 
equation for the model fitting error based on a simple 
extension of our particular form of the coefficient 
evaluation equations, without explicitly solving for the 
model coefficient estimates. Using our equations, we are 
able to show how it reduces to a special but widely used 
efficient recursive parameter estimation technique (Durbin’s 
[Ref. 2J modification of Levinson's Algorithm) when two 
restrictive (from a performance modeling growth perspective) 
assumptions are made. Since we need our form of the 
equations and these other features in the remainder of this 
thesis, the equations are developed at this point. 
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Substitute {4.14} into { 4 . 

T 



V(i) 



w(n 2 , i-1 ) 
w(n 2 +1 , i-1 Y 



w(n 3 , i-1 ) 



7}, and define the following. 

T 

w( n 2 , i/i- 1 ) 
w( n 2 + 1 , i/i- 1 



w(n 3> i/i-1 )' 



[ V(i- 1 ) ! ¥(i/i- 1 ) ] { 4 . 23 } 

where W(i) is the N x c(i) data matrix for the (i) t ^ model, 

s t 

W ( i — 1 ) is the N x c ( i - 1 ) data matrix for the (i-1) model, 

and W(i/i-l) is the N x q(i) data matrix for the new terms 

,\th ,.st 

in the (i) model that are not in the (i-1 ) model. 

Substituting Sq . { 4 . 23 } into Eq . { 4 . 16} and simplifying; 



A ( i - 1 ) 


| 3(i/i-1 ) 

1 






h(i-1 ) 


B ( i/ i-1 ) T 


j A ( i/i- 1 ) 


i) 




ji( i/ i-1 ) 



{4.24} 

where the following definitions are made for convenience. 

T 

A( i - 1 ) ■ _1_ W( i - 1 ) W( i - 1 ) , a c ( i - 1 ) x c ( i - 1 ) matrix {4.25} 

N 

3(i/i-l) ■ J_ W ( i- 1 ) T W( i/ i- 1 ) , a c(i-l) x q(i) matrix {4.26} 

N 

T 

A(i/i-l) = 1_ V(i/i-l) W(i/i-l) , a q(i) x q(i) matrix {4.27} 
N 



Ji(i-1 ) = j_ W(i-1 ) , a c(i-1 ) column vector 

N 

T 

h_(i/i-l) = J_ V(i/i-l) ^ , a a(i) column vector 

II 



{4.28} 

{4.29} 



The set of linear equations {4.24} is a special permuted 

. th 

form o f the normal equations for the ( i ) . model . This 
special form results from the ordering or vr(n,i) described 
by Eq . { 4 . 14 }, and leads to an efficient set of recursive 
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solution equations for J Z (i) and j>(i). It also provides the 

basis for our unified approach to the model determination 

and growth problem examined in the next chapters. 

Recursive Model Growth Solution and Evaluation Equations 

The set of simultaneous linear equations {4.24^ has a 

2 

compact solution for _t>(i) and J (i), based on the previously 

2 

obtained jj(i-l) and J (i-l). Appendix A contains this 
development and we only state and use the results here. 



It is convenient to use the following definitions; 
?(i) = — A ( i — 1 ) ^B(i/i-l) , a c(i-l) x q(i) matrix 

G(i) = A(i/i — 1 ) - B(i/i-l) A ( i — 1 ) B(i/i-l) 



{ 4 . 30 | 



= A(i/i— 1 ) + B(i/i-l) F(i) , a q(i) x q(i) matrix f 4 - 3 1 } 



_£( i ) = h(i/i-l) - B ( i / i — 1 ) t>(i-l) 

T 

= li ( i/ i — 1 ) + F(i) h(i-l) , a q(i) column vector 
jc ( i ) = G(i) ^_£(i) , a q(i) column vector 

As long as | A ( i / i - 1 ) | ^ 0 

then the solution of {4.24] is given by 



2( i) 



i>( i- 1 ) 

0 



F ( i ) 
I 



k(i) 



{4.32] 

{4.33] 

{4.34] 

{4.35] 



where 0. is the null vector and I is the identity matrix. 
The resulting minimum average sum squared error value is; 



J 2 (i) = J 2 (i-1) - £(i) T k(i) 



{4.36] 



In addition, the following recursive .relationships exist; 



A ( i ) 



-1 



A ( i — 1 ) ^ + F(i)G(i) F(i) T 



i F ( i ) G ( i ) 1 



G(i)" 1 F ( i ) T 



I 



G ( i ) 



-1 



{4.37] 



h(i) 



[ h( i- 1 ) 



h ( i / i - 1 ) ] 



{4.38] 
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C. CAPABILITIES OF THE RECURSIVE EVALUATION TECHNIQUE 

Ve now demonstrate some of the advantages of using 
equation [ 4 -. 35 } as an alternative solution to {4. 18] for 
model i. Ve showed that evaluation of equation {4. 18} for a 
3VM(d,m) requires the inversion of a matrix of size c(d,m) 
given by equation { 2 . 9 }. Examination of { 4 - 3 0 } through 
f 4 - 38} reveals that only one smaller inversion of size q(i) 
need be performed to evaluate Eq . {4-35} and Eq . { 4 . 36 }. 

This is the inversion of G(i) required for the calculation 
of jc(i) in Eq . {4.33}. 

For a BVM, the size of matrix G(i) at the (i)th iteraion 
is given by the following equation. 

size of G ( i ) = q(i) = c(d i ,m i ) - c(d i . 1 , m ^ ) {4-39} 

where d ^ , m^, d^_^ , and m^_^ are the BVM degree and memory 

at iterations i and i-1, respectively. The computational 
cost of recursively evaluating models using Eq . {4.35} and 

Eq. {4.36} is a function of the degree and memory of the 
various models 1 , 2 , . . . , i - 1 , i . 

Table 3 represents an example where we consider three 
different paths from the BVM with d=1 and m*1 f to the BVM 
with d = 4 and m*4. The paths are denoted with arrows and 
oversized letters. The order of -complexity involved in this 
example is shown in Table 4. A direct evaluation of the BVM 
with d=4 and m s 4 is given for comparison purposes. 




TABLE 3: Flow of four growth paths through the chart of the 

number of coefficients in a BVM of degree d and memory m. 



Path 


i 


Model 
d » m^ 


c(d i ,m i ) 


Size of Matrix 
to be inverted 
q( i) 


Inversion 
Operations 
[ q ( i ) l 3 / 3 


To tal 


A 


i 


(1,1) 


3 


3 


9 






2 I 


(2,2) 


20 


1 7 


1 638 






3 i 


(3,3) 


1 1 9 


99 


3. 234E+5 






4 j 


(4,4) 


714 


595 


7. 021 E+7 


7. 054E+7 


B 


1 


(1,1) 


3 


3 


9 






2 


(1.2) 


5 


2 


3 






3 


(1,3) 


7 


2 


3 






4 


(1 ,4) 


Q 

J 


2 


3 






5 


(2,4) 


54 


45 


3. 037E+4 






6 


(3,4) 


21 9 


1 65 


1 . 497E+6 






7 


(4,4) 


71 4 


495 


4.043E+7 


4 . 1 96E+7 


c 


1 


(1,1) 


3 


3 


9 






2 


(2,1) 


9 


6 


72 






3 


(3,1) 


1 9 


1 0 


334 






4 


(4,1) 


34 


1 5 


1125 






5 


(4,2) 


1 25 


91 


2.51 OE+5 






6 


(4,3) 


329 


204 


2 . 830E+5 






7 


(4,4) 


71 4 


385 


1 . 900E+7 


2 . 208E+7 


D 


1 


(4,4) 


71 4 


71 4 


1 . 21 3E+8 


1.21 3E+8 



TABLE 4: Order of Complexity for four growth paths. 
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Paths A, B, and C each result in a lower total 
computational complexity than the direct evaluation of the 
3VM(4,4) model described by path D. Other paths are 
possible but this example is representative of the 
computational savings that result from the use of the 
recursive algorithm. 

The development of the recursive algorithm is based on 
three assumptions. 

(1) All model equations are linear in their respective 

coefficient vectors. 

(2) The equation of the (i) model includes all of 

r . s t 

the terms contained in the (i-1) model, plus some new 

terms. This is described mathematically in equations 
{4.9} , {4. 10} , and {4. 1 4 } . 

(3) The determinant of A(i/i— 1 ) is not zero. 

We explicitly avoided any assumptions on the 

/ . . . s t • 

relationship between w(n,i-1), the terms in the (.i-1) 

model, and _w( n * i/ i- 1 ) > the new terms appearing in the (i) 

model. This results in a general recursive solution 

algorithm that is applicable for any type of model growth we 

care to consider The following chapter shows that the 



7 We still use the limitation on the form of each term 
that we defined in Sq. {3*6} for continuity of presentation, 
but mention here that other functional forms besides integer 
products of observations could be used as long as the 
resulting model equation is still linear in the unknown 
coefficients . 
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existing "recursive-in-order" algorithms (e.g. Levinson's 
[Ref. 2-4 and 20 - 25] and Lattice [Ref. 5-8 and 39] 
are special cases of the recursive evaluation equations 
presented here. 

Chapter V develops several new techniques for specifying 
possible model term vectors w(n,i/i-1); for recursive growth 
using the BVM. These are less restrictive than existing 
techniques, and allow for more accurate and compact modeling 
of typical systems of interest. 
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V . TECHNIQUES FOR GROWING MODELS 



A. OVERVIEW OF MODEL GROWTH 

The objective of our analysis is to determine patterns 
or other key behavior properties from the measured data, and 
use this information to efficiently formulate a suitable 
mathematical model. This model relationship is evaluated 
against both its ability to predict behavior of the output 
time series within some reasonable and statistically 
quantifyable degree of accuracy, and its compactness of form. 

Earlier work in model development was generally limited 
to an assumed linear relationship, and started with 
techniques like harmonic analysis and mathematical transform 
theory [Ref. 32, 33 and 3**]. In the late 1960's, time 
series statistical analysis methods were developed by Box 
and Jenkins [Ref. 17]. These methods are related to 
transformations of the spectral methods, and approach the 
characterization problem from the different perspective of 
prediction form models [Ref. 35]. These techniques are not 
closed form solutions to the system characterization 
problem, but are instead multistage approaches that have 
been widely used for the time series analysis of real world 
systems [ Ref . 36 ] . 

The Box and Jenkins technique assumes a general class of 
time series models which has been found, experimentally, to 
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be extremely rich. The procedure continues as a trial-and- 
error process with decision points where the analyst is 
required to select the next step based on the available 
information [Ref. 37]. 

Since existing linear time series techniques are not 
closed form solution algorithms permitting full analysis 
without human intervention and decisions, it would not be 
surprising that we are unable to find a complete closed form 
algorithm for the more general nonlinear relationship case. 
But nonlinear systems characterization is interesting and 
important, and a solution is still worth pursuing. 

Chapter IV introduced a general, recursive set of 
equations for efficiently evaluating related sets of model 
equations. Efficiently handling the system characterization 
problem requires a method for determining what new model 
terms to add at each iteration; how to "grow” the model. 

This chapter discusses existing techniques for recursive 
model growth (e.g. "recursive-in-order") which have been 
applied to some linear and nonlinear systems. Six variants 
on this type of " block-f orm' - technique are developed for the 
more general BVM form, and the capabilities and limitations 
of all these techniques are investigated. 

B. EXISTING TECHNIQUES FOR MODEL GROWTH 

The systems identification literature [Ref. 2-8, 20 - 
25, 38 and 39] contains two different techniques for both 

specifying and recursively estimating model coefficients. 
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and no explicit techniques for just the recursive evaluation 
of model fitting error. Both techniques are based on the 
concept of considering new model terms that have a unique 
"increasing order" relationship to existing model terms, and 
take advantage of the special Toeplitz matrix structure that 
can be made to occur in the resulting equations for the 
coefficient estimates [Ref. 2-4 and 20 - 25]. This 
Toeplitz structure is limited to a restrictive class of 
models, and requires the use of the Autocorrelation error 
minimization method. The iterative solution technique is 
based on Durbin's s impl i c i a t i on of Levinson's algorithm 
[Ref. 2], and is well known in the literature. This 
technique is a special restrictive case of the general 
recursive algorithm presented in Chapter IV, and the details 
of the relationship are given in Appendix B. 

Theorem 1 and Experiment 1 show that the use of the 
Autocorrelation method typically produces a suboptimal fit 
when the data sequences are finite. In terms of nonlinear 
models, the requirement for Toeplitz (or even Block- 
Toeplitz) structure for the least squares matrix severely 
limits the choice of allowable models. The "regular-form" 
kernel Nonlinear ARM A model used by Perry [Ref. 8] is a 
typical example of a restricted choice of terms in the 
model. This is discussed again in the next section. 



D 












The second recursive coefficient estimation and growth 
technique is based on "Lattice-filtering’’ [Ref. 5 - 8, 38 
and 391. This is a prediction-error version of Levinson's 
algorithm using a lattice structure implementation rather 
than the more conventional tapped delay line type of 
implementation. The error residual signal after each stage 
of the lattice has converged, is used in the computation of 
the lattice parameter estimates and signals used in 
subsequent stages. The lattice technique is a special 
implementation of the general algorithm of Chapter IV which 
has limited applicability. This technique offers no 
advantages for the unrestricted model growth we wish to 
consider . 

C. RECURSIVE MODEL GROWTH WITH THE BVM 

Chapter II introduced the BVM and showed that it 
subsumes the MA , AR, ARMA, Volterra, and Bilinear model 
forms. The recursive model evaluation and coefficient 
estimation algorithm presented in Chapter IV can be used for 
efficient and meaningful model growth of any of these model 
forms . 

This section presents six extensions of the recursive- 
in-order techniques which apply directly to the general BVM 
form. In all cases, the first model is evaluated by a 
direct least squares fit using Eq {4.1} through Eq. {4.6}. 
Subsequent models are evaluated using the recursive 
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relationships presented in Eq {4.8} through {4.37}. We 
restricted the upper limit of the lag on the model terms to 
be equal to the memory m for both the input and output 
terms, for clarity of presentation. This restriction is 
removed in the model growth technique presented in the next 
chapter . 

The first growth technique starts with the base model, 
BVM(d.m) = BVM(1,1) and uses the following fixed procedure. 
The fitting error J(d,m) = J ( 1 , 1 ) is evaluated for this 
first model, and for subsequent models BVM(1,1+i) where 
i=1.2,3.--. until J(1,1+i) stops decreasing significantly 
(the meaning of which will be discussed later). This last 
significant model is denoted as the new base model BVM(1,m). 
The fitting error J(1+j,m) is evaluated for subsequent 
models BVM(1+j,ra) where j = 1 , 2 , 3 . - • • until J(1+j,m) stops 
decreasing significantly. This last model is denoted as the 
new base model BVM(d,m) and the iteration starts on the 
memory m again. This two-phase iteration is continued as 
long as meaningful reduction in fitting error is obtained. 
This search strategy is denoted as the "M Directed Growth" 
because the initial phase involves growth in memory m (Fig. 
21 .a) . 

A second growth technique involves a similar algorithm 
with the difference that the first phase iteration is on the 
degree d (Fig. 21. b). This technique is therefore denoted 
as "D Directed Growth". The choice of an appropriate 



significance test for switching between the two* phases of 
these- directed growth algorithms remains an open question. 

A third growth method is denoted as "Diagonal Growth", 
and again starts with the evaluation of the base model 
3VM(d,m) * 3 VM ( 1 , 1 ) . The fitting error of successive models 
3VM( 1 +i , 1 + i) for i=1,2 , 3 , • • • is evaluated until J(l+i,1+i) 
is acceptably small (Fig. 21. c). 




Figure 21. a Figure 21. b Figure 21. c 

F IGURS 2 1 : Three Growth Methods for the BVM 

A fourth technique is denoted as "M-D Zig-Zag Growth" 
and starts with the evaluation of the base model BVM(l,l). 
The memory m and degree d are alternately incremented until 
the fitting error of the resulting model is acceptable (Fig. 



22. c) . 



A fifth technique denoted as "D-M Zig-Zag Growth" 
involves a similar algorithm with the key difference that 
the first phase iteration is on the degree d (Fig. 22. b). 

A sixth growth strategy is denoted as "Neighbor Growth". 
It starts with the base model BVM ( 1 , 1 ) and evaluates its 
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fitting error J ( 1 , 1 ) . Next we evaluate models that differ 
from the base model by one increment of degree, one 
increment of memory, or both. In this starting case we 
evaluate J(l,2), J ( 2 , 1 ) , and J(2,2). We denote the model 
with the lowest fitting error J as the new base model and 
continue this iteration process until the decrease in J is 
no longer significant (Fig. 22. c). 




FIGURE 22: Three Additional Growth Methods for the BVM 



These six techniques all fall under the general type of 
growth we refer to as " b lo c k- f o rm " techniques. At each 
iteration the model growth is accomplished by automatically 
including one of a predetermined set of model terms. 

One other model growth technique has been proposed by 
Perry [Ref. 8]. A special and restricted form of the 
nonlinear ARMA model (earlier version of the BVM) and the 
Autocorrelation error minimization method are used to 
develop a special multichannel lattice form parameter 



estimation solution. Reference 8 does not contain any 
experimental verification of this technique. We analyze 
this technique to show its intrinsic weaknesses. 

A quadratic nonlinear ARMA model of this special form 
can be written as shown below (a translation of Ref 5, pp. 
193, Eq {4.29} into the notation of this thesis). 



M, M 2 M, 

y(n)= 7 9 l; 0 ( h l )u ( n_h 1 ) + Z' Z 



V° 



h 2 = ° h i = 0 



2 ; o 



M, M 3 M, 

\Z, ®o;l (h l )y(n - h l ) + 9 0;2 (h l 



V 1 



h 3 =0 h 1 =0 



»4 JJl 



♦ Y T 9. . . (h ,h )u(n-h )y(n-1 -h -h ) 
V Hr, v Hr, i, 1 1 4 1 14 



h 4 = 0 h^O 



(h ,h 2 )u(n-h 1 )u(n- 



, h y ( n- 1 -h 1 )y( n- 1 



+ e( n) {5 



The proposed technique of Reference S requires that the 
user prespecify the integer values of the upper summation 
limits and Then Eq . |5«l} becomes a recursive- 

in-order model equation that can partially evaluated by 
least squares lattice techniques. Some of the model terms 
do not fit the restrictions of the lattice solution and must 
be evaluated separately by direct least squares. Despite 
the attractiveness of the potentially efficient lattice 
form, the requirement to prespecify all but one of the upper 
summation limits of Eq |5-l} excessively complicates any 
systematic model growth with this method. Reference 8 does 
not provide any suggestions on how to select these upper 
limits. Since this technique automatically involves a fixed 



set of terms at each iteration, it falls under our 
definition as a block-form technique. It basically is an 
attempt at fitting a problem (parameter estimation) to a 
particular form of solution (lattice form recursive-in- 
order), and is inferior to the growth methods of this 
chapter and the following. 

These block-form techniques can all eventually subsume 
any system of the BVM form. They unfortunately require a 
significant amount of computations as the degree and memory 
increase (See section VI. D). The first five techniques are 
all nonlinear extension of the linear recursive-in-order 
concept discussed earlier, and are brute-force methods. The 
Neighbor Growth technique offers an appealing approach for 
potentially more efficient model growth, but also suffers 
from the problem of high computational cost when used to 

* 

evaluate the addition of a large number of model terms. We 
continue the discussion of efficient model growth along a 
related line in the next chapter. 
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VI. SEARCH INDICATORS FOR EFFICIENT MODEL GROWTH 



A. DISCUSSION 

The block-form growth techniques of Chapter V follow 
from the conventional concept of model growth in linear 
systems. Closer examination reveals that these techniques 
have a number of serious flaws when used for nonlinear model 
growth. Given a finite amount of measurement data, we can 
only consider evaluating models when the number of model 
terms is less than or equal to the number of data samples. 
Table 1 reveals that many of these nonlinear models can only 
be evaluated with extremely long sequences of data 
measurements. For example, the model BVM(3,5) has 363 
different terms. Only a few of these terms may be needed in 
modeling any third degree dynamic system whose equation 
involves the factor u(n-5) or y(n-5). However, all of these 
363 terms are involved in the full model evaluation when the 
block form growth techniques are used, and sufficient data 
measurements must be available. The following points 
summarize the results of many computer simulated model 
growth experiments (See Chapter VII later). 

As the degree or memory increase, all block-form 
modeling techniques automatically consider an increasing 
number of terms at each subsequent growth iteration. This 
results in rapidly increasing computational cost, and often 



produces an ill-conditioned least squares matrix A(i) due to 
the inclusion of several model terms with nearly equivalent 
properties in terms of values in this matrix. By ill- 
conditioned, we mean the condition number (the ratio of the 
largest to smallest eigenvalue) is numerically large (e.g. 
greater than 10000). 

A higher condition number for matrix A(‘i) is related to 

less accurate estimates for the coefficients j>(i), and 

2 

higher fitting error J (i). In some cases the matrix A(i) 
becomes so ill-conditioned that it is no longer positive 
definite, and the resulting model evaluation is no longer 
optimum in any least squares sense. It has been 
experimentally verified that the general use of these block- 
form modeling techniques often produces poor results for 
nonlinear systems; namely offset model coefficient 
estimates, high fitting error, and the inclusion of terms 
that are not actually needed. An example is provided in 
Chapter VII. 

One possible approach to overcoming the preceding 
problems is to start with some base model such as BVM(l,l), 
and use one of the block-form techniques just to specify a 
new set of q(i) model terms. Instead of evaluating these 
candidate terms as a set, assume that only a subset of them 
is actually needed. The problem is how to find the 
particular subset of these terms required in the final 



model . 



This approach is related to standard stepwise regression 
analysis [Ref. 40], and also to a recently published 
technique known as GMDH , the Group Method of Data Handling 
[Ref. 41]. These preceding techniques are general enough to 
permit consideration of a wide variety of model terms and 
both can avoid the ill-conditioned solution, but they still 
have the following major problem. With q(i) potential new 

q ( i ) 

model terms, there are 2 possible model equations to 

consider. Except for small values of q(i), the exhaustive 
evaluation of each of the corresponding solution equations 
rapidly becomes prohibitively expensive. 

There is the additional problem of a stopping criterion. 

Examination of Eq. {4.33} and Eq. {4.36} shows that the 
2 

fitting error J“(i) is monotonically decreasing when new 
model terms are added and matrix G(i) ^ is positive 
definite. Therefore, while only some number r out of these 
particular q(i) candidate model terms may be needed in the 
final model, the fitting error with r+1 terms will still be 
lower (with the exception of numerical errors or an exact 
mode 1 fit). 

In all practical situations, we have only finite 
measurement data and finite computer resources, and are left 
with an interesting and not unusual problem. When the 
preceding model growth techniques yield ill-conditioned 
solutions or increase to a point (1) greater than the finite 
data can handle, or (2) beyond the computational resources; 



are there any prudent procedures that we can employ ? The 
field of artificial intelligence has provided some 
motivation in related problems including chess playing 
programs and voice-recognition methods. The basis for 
comparison is the intractability of the exhaustive solution 
when there is only finite data, time, and computational 
power. Since the performance modeling problem is 
interesting, and has practical applications, we develop a 
semi-heuristic technique to follow when there are not enough 
resources for the exhaustive solution. 

This chapter presents the new concept of '’search 
indicators’ 1 for efficiently growing a model of an unknown 
linear or nonlinear system from a finite set of input-output 
measur ement s . Rather than attempting to solve the typically 

large set of normal equations for all of the new candidate 
model terms, the proposed concept uses an easily computable 
search indicator (scalar value), or a set of such search 
indicators, for each candidate model term contained in 
w^(n,i/i-1) at each growth iteration i. The relative values 
of these search indicators are used to systematically 

g 

exclude those terms expected to have insignificant effect 

8 The word expected is used to acknowledge the heuristic 
nature of some of these search indicators and of their use. 
We have been unable to prove that any technique based on 
their use is guaranteed to pick the optimum model terms at 
each growth iteration. These indicators are logical factors 
based on the recursive model evaluation equations, and the 
results of many experiments show that techniques using some 
search indicators provide for highly efficient model growth. 



on reducing the fitting error. The remaining term or terms 
are retained in the vector w(n,i/i-1) for this (i) C ^ model, 
and used to form a much smaller set of normal equations that 
are efficiently evaluated by the general recursive equations 
presented in Chapter IV. 

This proposed two-phase concept offers a number of 
improved capabilities over the existing growth techniques. 

(1) Since we compute the search indicators for each 
candidate model term seperately, we can consider 
more potential model terms than the number of data 
measurements. As a result, the terms of nonlinear 
models with large degree and memory can now be 
considered. We must, of course, eliminate enough 
terms such that the reduced model form evaluated in 
the second phase has fewer unknowns than the number 
of data measurements. 

(2) This technique allows the evaluation of widely 
different model terms at any iteration. Unlike the 
recursive-in-order (or more general block-form) 
techniques, there is no longer the implicit 
restriction that the current model contain all of 
the possible set of input, output, and bivariate 
terms specified by a particular degree and memory. 

( 3 ) The initial set of model terms w(n,1) can be 
better chosen by the search indicator concept, 
rather than by blindly picking a predefined base 



model like BVM(1,1). The computer simulated 
experiments of Chapter VII show that this property 
allows for the efficient characterization of a 
general class of systems having an input-output 
delay L (e.g. where terms containing the factor 
u(n-k) for k=0, 1 ,2, . . . ,L-1 are not needed in the 
final model). In cases where the system under 
consideration has a delay factor L, the block-form 
techniques fail to recognize and exploit this 
property, and often converge on a more complex 
model . 

(4) The search indicator technique selects one or 
more candidate model terms in the first phase, 
produces a much smaller matrix A(i/i-1), and 
therefore significantly reduces the computational 
burden. It is also capable of efficiently handling 
the previously discussed problem of i 1 1 -c ond i t i on i ng 
caused by nearly equivalent model terms. 

These features are demonstrated in the following sections. 

The next section defines various possible search 
indicators based on the signals, vectors, and matrices 
contained in the recursive model growth solution and 
evaluation equations introduced in Chapter IV. Some 
physical interpretation is given for each of these search 
indicators, and the set is reduced to a smaller set worthy 
of further investigation. Results of many computer 
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simulated experiments have shown the superior growth 
capabilities of the proposed concept, and confirmed that the 
search indicator technique provides significant 
computational savings and accuracy improvements over all of 
the previously discussed growth techniques. Examples of 
model growth are provided in the computer simulated and real 
world experiments of Chapter VII. 

B. DEVELOPMENT OF SEARCH INDICATORS 

The notation Wj(n,i/i-1) is used to represent the (j)^ 

model term (out of the candi-date set of terms) that we 

consider adding at the ( i ) C ^ iteration, given that we have 

s t 

previously evaluated and accepted a model at the (i-1) 
iteration. We let q(i) still represent the number of 
candidate model terms considered at the (i) th iteration, and 
therefore j=1,2,3t.-..q(i). 

The following development is partially based on the 
notation for the signals, vectors, and matrices contained 
within the recursive solution and evaluation equations of 
Chapter IV. One important note of clarification needs to be 
made at this point to minimize potential confusion. 

The set of equations {4.8} through {4.38} in Chapter IV 
was developed to evaluate the improvement in model fitting 
error and calculate the new coefficient estimates based on 
adding the entire candidate set of terms v*(n,i/i-1) to the 
model with the existing set of terms w(n.i-l). The set of 
search indicators developed in this chapter is to be 
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calculated for each of the candidate model terms Wj(n,i/i-1) 
in the candidate set w(n , i/i-1 } . These indicators are 
designed to each give some partial metric or measure for the 
improvement in model fitting error. As a result of this 
development, many of the matrices and vectors defined in Eq . 
{4.3} through Eq . {4.38} for the evaluation of multiple 

model terms are used in this chapter in a reduced form (e.g. 
vectors and scalars, respectively) for the search indicator 
evaluation of each term. Whenever possible we use the lower 
case vector version of the matrix designation to represent 
the corresponding reduced form vector (e.g. Wj(i/i-1) for 
W(i/i-1)). Likewise we use the scalar representation to 
describe the corresponding reduced form of a vector (e.g. 
hj(i/i-1) for h(i/i-1)). 

These reductions are made only for clarity in the 
development of the search indicators. Once a subset of 
model terms is selected by the search indicators, the full 
form equations of Chapter IV are used to evaluate the 
fitting error and coefficient estimates. It is noted, 
however, that some of the factors calculated in the 
evaluation of the search indicators for the candidate model 
terms can be used again in the actual evaluation of the 
model performance. Thus, some of these computations will 
serve double duty. 

A primary concern is efficiency of computation, so the 
numerical complexity (number of multiplications and 
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divisions) involved in calculating each search indicator has 

been analyzed. The notational convention 0(N) will be used 

to denote N multiplication or division operations. This 

complexity notation is included with the development of each 

search indicator, and summarized with examples in Table 5 

and Table 6 after the development of all of the indicators. 

s t 

Denote the size c ( i — 1 ) of the (i-1) model as P t and 

the number of data points in the error minimization as N. 

s t 

Since we have completed the the evaluation of the (i-1) 

model, the following matrices and vectors are available. 

W ( i - 1 ) = a N x P matrix given by Eq. (4.23) 

A(i-1) = a P x P matrix given by Eq . {4.25} 

Ji(i-I) = a P x 1 column vector given by Eq. {4.28} 

We also have A ( i — 1 ) 1 and j>(i-1) t the coefficient vector. 

Some preliminary vectors needed for the development of 

the search indicators are presented at this point. 

T 

Wj(i/i-1) = [Wj(n 2 ,i/i-1),Wj(n 2 +1,i/i-1),...,Wj(n^,i/i-1)] {6.1} 

= a N x 1 transposed vector of the signal specified 
t h 

by the (j) candidate model term over the 

interval (n^.n ) . This is the reduced version of 

the data matrix W ( i / i - 1 ) given by Eq. {4.23} » in 

t h 

the case of just the (j) candidate model term. 

e(n,i-1) = y(n) - w(n,i-1) £(i-1) for n^<=n<=n^ {6.2} 

= value of the error residual at discrete time n 

s t 

from the (i-1) model iteration. This can be 
computed with P multiplications per point. 



122 



T 

ei(i-l) = [ e(n e( n^ + 1 , i-1 ) , . . . , e(n 3 ,i-1) ] { 6 . 3 } 

= a N x 1 column vector of the residual sequence 

s t 

values from the (i-1) iteration over the 

interval (n 0 ,n^) , This requires PN multiplications. 

b.(i/i-1) T = W ( i - 1 ) T w • ( i / i - 1 ) {6.4} 

J N 

= the reduced version of matrix B(i/i-1) given by 

c h 

Eq . {4.26} in the case of just the (j) candidate 

model term. Since this is a P x N matrix times a 
N x 1 column vector, the cost is 0(PN+1). 
f j (i/i-1) T = -A( i-1 ) _i bj( i / i-1 ) {6.5} 

= the reduced version of matrix F(i) given by 

Eq . {4.30} in the case of just the (j)^ candidate 
model term. Since A ( i — 1 ) ^ is a P x P matrix that 
we already computed, and bj(i/i-1) is a P x 1 
column vector obtained in Eq . {6.4} at a cost of 

OCPN+1), the total cost of computing fj(i/i-1) is 
0 ( P 2 ) + 0 ( P N+ 1 ) = 0(P 2 + PN+1). 

Twelve different search indicators were developed and 
examined in this work. The initial set of search indicators 
I C j , 1 ) through I ( j , 8 ) was developed from an algebraic 
perspective; i.e. these relationships arose from an 
examination of the general recursive evaluation equations of 
Chapter IV, under the condition of adding a single new model 
term Wj(n,i/i-1). Each search indicator therefore has a 
direct relationship to the actual system characterization 
experiment . 
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The first search indicator is the time average of the 



the calculation of I ( j , 1 ) requires 0(N + l) operations for 
each candidate model term. 

This indicator corresponds to the scalar version 
hj(i/i-l) of the vector li(i/i-l) defined by Eq . {4.291- 

While intuitively appealing as the "empirical" cross- 
correlation between the output of the system under test and 
the signal specified by the candidate model term, this 
indicator has a basic flaw. It is a function only of the 
output of the system and the candidate model term, and as 
such, does not depend on the particular terms in the 
previous model. Numerous computer simulated experiments 
have verified that l(j,l), taken alone, is unsuitable as a 
reliable search indicator for model growth. 

The second search indicator is the value corresponding 
to the reduced version g^(i) of the vector _g( i) of 
Sq. {4.32}, in the case of just the (j) candidate model 




term, and the output signal of the system* 




{ 6 . 6 } 




term. 




N 



= I( j, 1 ) + f ) T h(i-1 ) 



{6.7} 



where _Wj(i/i-l) is a N x 1 vector and jr is a N x 1 vector. 



Since _fj(i/i-l) is a P x 1 vector obtained at the cost 
2 

0(P +PN+1), and h(i-l) is a P x 1 vector, the calculation of 

2 

l(j,2) requires a total of 0(P +PN+P+N+2) operations for 

each candidate model term. 

We digress momentarily to examine some of the 

characteristics of the full vector j*(i) given by Eq. {4*32}. 

Substituting {4.26} and { 4 . 29 } into { 4 . 32 } produces; 

£(i) * 1 V(i/i-1 ) T £ - J_ W(i/i-1 ) T ¥(i-1 )js(i-1 ) 

N N 

= j_ V ( i / i - 1 ) [ 2 “ W( i-1 )j>( i- 1 ) ] 

N 

= 1 W(i/i-1 ) T e(i-1 ) 

N 

T 

= J_^e(i-l) W (i/i-1 ) {6.8} 

» 

Since {e(n,i-l){ is the prediction error sequence of the 
s t 

(i-l) model, and the vector _e(i-l) contains the values of 
e(n,i-l)}, we see that jj[(i) is a vector whose (j) element 
is the normalized inner product of _e(i-l) and the (j) 
column of W(i/i-l). Examination of Eq . { 4 . 23 } and Eq . { 6 . 1 } 

reveals that the (j) column of W(i/i-l) is the vector 
Wj(i/i-l), and yields the following expression. 

g (i) = J_ _e( i-1 ) T w ( i/i-1 ) 

1 1 

g (i) = j_ _e ( i- 1 ) 

1 N 



'q(i) 



(i) = 



T 

e( i- 1 ) w ( i/i-1 ) 
- — q(i) 



16.9} 



1 25 



where ^(i) = [ g^i), g 2 ( i) , • • • , g j ( i) q ( i) ]{6.10j 

and where q(i) = c(i) - c(i-l) { 6 * 1 1 } 

Examination of Eq . { 6 . 9 ! shows that the value gj(i) is 

the time average of the product of the error residual signal 

and a signal formed by products and powers of products, of 

the input-output measurements corresponding to the 

t h 

specification of the (j) candidate model term. This gives 
physical interpretation and increased meaning to the value 
gj(i), which is contained in Eq . [ 6 . 7 } (search indicator 

two), and equivalently in Eq . f 6 • 12} below as search 
indicator three. 

I(j,3) * 1 w j ( i/i-1 ) e_ ( i-1 ) = J_ £ 3 wj (n,i/i-1 )e(n,i-1 ) = g^ 
N N n-n^ 

2 [6.12] 

Since _Wj(i/i-l) is a N x 1 vector and ^e(i-l) is a N x 1 
vector obtained at the cost 0(PN), the calculation of 
I ( j > 3 ) requires 0(PN + N + l) operations for the first candidate 
model term. For the second and subsequent candidate model 
terms the cost is reduced to 0(N + l) since je(i-l) has already 
been calculated. Tables 5 and 6 show that l(j,3) can be 
computed much more efficiently than l(j,2). It is also 
intuitively appealing to be using the error residual from 
the previous model in evaluating the usefulness of a new 
candidate model term. 

r \ s c 

If the (i-1) model produces an exact match to the 
measured input-output data {u(n)| and fy(n)j, then gj(i) is 



zero. 



This occurs regardless of the choice of the new term 



model . It 



Wj(n,i/i-l) considered for inclusion in the (i) 

follows that the absolute value of gj(k) should be a useful 

measure at any step k < i in the growth iteration. It is 

conjectured that this represents a measure of the relative 

benefit of that particular model term as compared with other 

possible choices of terms. In this regard the term that 

produces the largest absolute value of gj(k) would also 

2 

probably result in the smallest value of J (i), the error 
fitting criteria. This last point remains to be 
demonstrated . 

The above discussion indicates that l(j,3) should be a 

potentially good search indicator, either alone, or in 

combination with other factors. We will later consider 

other search indicators based on g^(i). 

The fourth search indicator l(j,4) is the time average 

/ \ th 

of the square of the signal specified by the (j; candidate 
model term. 

T n 1 r -i ^ 

I(jf4) * Wj( i / i “ 1 ) wj(i/i-l) = J_ £ Jwj ( n, i/i-1 )] {6.13] 

N N n = n J" 



Since vfj(i/i-l) is a N x 1 vector, the calculation of I ( j , 4 ) 
requires 0(N+l) operations for each candidate model term. 
This corresponds to the reduced version of matrix A(i/i-l) 
given by Eq . {4.25]* It can be efficiently computed, but 

suffers the same flaws as l(j,l). 

The fifth search indicator is the scalar corresponding 
to the reduced version of matrix G ( i ) given by Eq . {4.32] in 

/ N Ch 

the case of one additional coefficient in the (i) model. 



I ( j , 5 ) = ± w .(i/i-1 ) T w -(i/i-1 ) + b .(i/i-l) T f .(i/i-1) 

H J J J 

= I ( j . 4 ) + b(i/i-l) T f (i/i-1) {6.14} 

Since b j(i/i-1) is a P x 1 vector, and fj(i/i-1) is a P x 1 

2 

vector, the cost of computing f^( i/i-1) is 0(P +PN+1) and 

includes the cost of computing b^(i/i-t>). Therefore the 

2 

calculation of I(j,5) requires a total of 0(P +PN+P+N+2) 

operations for each candidate model term. Examination of 

Eq. {4.33 and {4.36} reveals that the scalar value I(j,5) is 

inversely related to the reduction in the fitting error that 

results if the single candidate model term is brought into 

the model. As such, there is reason to expect that I ( j * 5 ) 

would be a good search indicator, either alone or in 

combination with other factors. Unfortunately, the high 

cost for I ( j * 5 ) precludes its general use. 

*he sixth search indicator is the scalar value 

corresponding to the reduced version of vector k(i) from Eq . 

{4.33} in the case of one additional model term. 

I ( j , 6 ) = I ( j , 3 ) / 1 ( j . 5 ) 

T 

£ W, ( i/i-1 ) e( i-1 ) 

N 

w . ( i / i - 1 ) T w . ( i / i - 1 ) + b^ . ( i / i - 1 ) T £ . ( i / i - 1 ) {6.15} 
N J 2 2 

where w^(i/i-1) is a N x 1 vector, £(i-1) is a N x 1 vector 

obtained at the cost O(PN), and fj(i/i-1) is a P x 1 vector 

2 

obtained at the cost 0(P +PN+1) which includes the cost of 
computing the P x 1 vector bj(i/i-1). Therefore, the 
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2 

calculation of I ( j * 6 ) requires 0(P +2PN+2N+P+4 ) operations 

for the first candidate model term. For the second and 

subsequent candidate model terms the cost is reduced to 
2 

Q(P +PN+2N+P+4 ) since e(i-1) has already been calculated. 

Examination of Eq . {4.36} reveals that the value of 

I ( j . 6 ) is directly related to the reduction in the fitting 
2 

error J (i) that results from including the candidate term 
in the model. Tables 5 and 6 indicate, however, that there 
is a very high computational cost associated with this 
search indicator . 

The seventh search indicator is the value of the change 

2 

in the error criterion J (i) as a result of including the 
candidate model term. It is based on Eq . {4. .32}, {4.33} * 

and {4.36}. 

I(J,7) = I(j,2)/I(j,5) 

T T 

w j ( i / i - 1 ) y + f. (i/i-1) h(i-1) 

N 



± w .( i/i-1 ) T w .( i/i-1 ) + Id . ( i / i - 1 ) T f . ( i/ i - 1 ) {6.16} 

N J J -J 

where w^Ci/i-l) is a N x 1 vector, y_ is a N x 1 vector, 

h(i-1) is a P x 1 vector, and £j(i/i-1) is a P x 1 vector 

2 

obtained at the cost 0(P +PN+1) which includes the cost of 

computing the P x 1 vector ]3j(i/i-1). Therefore the total 

2 

cost of computing I(j,7) requires 0(P +PN+2P+2N+5) 
operations for each candidate model term. 

This is the exact value of the reduction in the error 
criterion resulting from including the candidate model term. 
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